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CONSERVED AND SPECIFIC STREPTOCOCCAL GENOMES 



5 CROSS REFERENCE TO RELATED APPLICATIONS 

This application claims priority of U.S. provisional patent application Serial No. 
60/406,237, filed August 26, 2002, U.S. provisional patent application Serial No. 60/406,676, 
filed August 27, 2002 and U.S. provisional patent application Serial No. 60/406,757, filed 
August 28, 2002. 

10 FIELD OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. The 
conserved or specific genomic regions can be used to identify, screen and develop vaccines and 
other treatments for Streptococcal infections and can be used in diagnostic assays to diagnose 

15 and identify Streptococcal infections. 

BACKGROUND OF THE INVENTION 

The genus Streptococcus consists of Gram-positive, chain- forming, spherical bacterial 

cells. Three species of clinical interest are S.pneumoniae ("pneumococcus" or "S.pn."), 
20 S.pyogenes ('group A streptococcus' or 'GAS 9 ) and S.agalactiae ('group B streptococcus' or 

'GBS'). Infections with these three pathogenic streptococci lead to conditions including 

pharyngitis, toxic shock syndrome and necrotizing fasciitis. 

Once thought to infect only cows, GBS is now known to cause serious disease, 

bacteraemia and meningitis in immunocompromised individuals and neonates. There are two 
25 known types of neonatal infection. The first (early onset, usually within 5 days of birth) is 

manifested by bacteraemia and infection. It is generally contracted vertically as a baby passes 

through the birth canal. GBS is thought to colonize the vagina of about 25% of young women; 

approximately 1% of infants born via a vaginal birth to colonised mothers will become infected. 

Mortality resulting from these infections is between 50 - 70%. The second type of neonatal 
30 infection is a meningitis that occurs 10 to 60 days after birth. If pregnant women are vaccinated 

with type III capsule so that the infants are passively immunised, the incidence of the late onset 

meningitis is generally reduced, although not entirely eliminated. 
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The "B" in "GBS" refers to the Lancefield classification, which is based on the 
antigenicity of a carbohydrate which is soluble in dilute acid and called the C carbohydrate. 
Lancefield identified 13 types of C carbohydrate, designated A to O, that could be serologically 
differentiated. The organisms that most commonly infect humans are found in groups A, B, D, 

5 and G. Within group B, strains can be divided into at least 9 serotypes (la, lb, II, III, IV, V, VI, 
VII, and VIII) based on the structure of their polysaccharide capsule. Further categories based 
on, for example, the expression of certain proteins have also been developed. 

GBS strains of polysaccharide capsule Type V were rarely isolated before the mid-1980's 
but now account for approximately one-third of clinical isolates in the US. Type V is the most 

10 common capsular serotype associated with invasive infection in nonpregnant adults, and the 
emergence of Type V strain over the past decade has been temporarily linked to an increase in 
GBS disease in this population. 

Group A streptococcus is a frequent human pathogen, estimated to be present in between 
5 - 15% of normal individuals without signs of disease. When host defences are compromised, 

15 or when the organism is able to exert its virulence, or when it is introduced into vulnerable 
tissues or hosts, however, an acute infection occurs. Diseases include puerperal fever, scarlet 
fever, erysipelas, pharyngitis, impetigo, necrotising fasciitis, myositis and streptococcal toxic 
shock syndrome. 

Pneumococcus is the most common cause of acute respiratory infection and otitis media 
20 and is estimated to result in over 3 million deaths in children every year worldwide from 

pneumonia, bacteremia, or meningitis. Even more deaths occur among elderly people, among 
whom S. pn. is the leading cause of community-acquired pneumonia and meningitis. Since 
1990, the number of penicillin-resistant strains has increased from 1 to 5% to 25 to 80% of 
isolates, and many strains are now resistant to commonly prescribed antibiotics such as 
25 penicillin, macrolides, and fluoroquinolones. See Tettelin, et al. (2001) Science 293, 248-506. 

The complete genomic sequence of a virulent isolate of S. pneumoniae was published by 
Tettelin, et al. (2001) Science 293, 248-506 and is available at the TIGR website at 
http://www.tigr.org . as well as on GEN BANK (available through the Pub Med website at 
http://www.ncbi.nlm.nih.gov/entrez/querv.fcgi) . The genomic sequence, the Tettelin article and 
30 its published supplemental material are incorporated herein by reference in their entirety. 

The complete genomic sequence of an Ml strain of S. pyro genes was published by 
Ferretti, et al. (2001) Proa Natl Acad, Sci. USA 98, 4658 - 4663 and is available at the TIGR 
website at http://www.tigr.org. The genomic sequence, the Ferretti article and its published 
supplemental materials are incorporated herein by reference in their entirety. 
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The complete genomic sequence of a serotype V strain of S. agalactiae (type V strain 
2603 V/R) was published on August 28, 2002 at Gen Bank Accession no. AE009948 (available 
through Pub Med at ht1p://www.ncbi.nlm.nih.gov/en1rez/querv>fcgi and/or was available on the 
same day at the TIGR website at http://www.tigr.org. Most of this sequence is also availabe in 
5 PCT International Patent Application Publication WO 02/34771 . The genomic sequence, the 
Tettelin article and its published supplemental materials are incorporated herein by reference in 
their entirety. 

Current treatments for Streptococcal infections include both antibiotics and prophylactic 
vaccination. Current vaccines, particularly with respect to GBS, suffer from poor 

10 immunogenicity, while the emergence of antibiotic resistant strains has lessened the 

effectiveness of currently used antibiotics. Accordingly, there is an increasing need for the 
development of new vaccines and antibiotics (as well as other small molecule bacterial 
inhibitors) to help prevent and treat Streptococcal infections. 

Applicants have identified regions of the Streptococcal genomes which can be used to 

15 identify and develop new vaccines and treatments for Streptococcal infections. Specifically, 

Applicants have identified polynucleotides of the Streptococcal genome which are conserved or 
specific to Streptococcal species, species serotypes, and/or specific serotype isolates. These 
polynucleotides and their expressed polypeptides can be used to screen, develop and design new 
vaccines, antibiotics and other small molecule bacterial inhibitors. These polynucleotides and 

20 their expressed polypeptides can further be used to diagnose and identify Steptococcal infections. 

SUMMARY OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. In particular, 

25 the invention relates to polynucleotides from Streptococcus which are conserved or specific to 
one or more of the species of S. pneumoniae ("pneumococcus" or "S. pn."), S. pyogenes ("group 
A streptococcus" or "GAS"), and S, agalactiae ("group B streptococcus" or "GBS"). The 
invention further relates to polynucleotides which are conserved or specific to one or more 
Streptococcal species serotypes, such as GBS serotypes la, lb, II, III, IV, V, VI, VII, and VIII. 

30 The invention still further relates to polynucleotides which are conserved or specific to one or 
more clinical isolates of a Streptococcus species. 

The invention is based on the identification of the following Subsets of genes. Genes 
falling within each subset are described with respect to referenced tables, lists, and/or figures (in 
particular the CGH map depicted in Figure 1). 
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The following Subsets relate to the GBS genome: 

GBS Subset 1: 1060 GBS genes which have homologs with GAS and with 
pneumococcus (Table 8); 

GBS Subset 2: 225 GBS genes which have homologues with GAS, but not with 
5 pneumococcus (Table 10); 

GBS Subset 3: 176 GBS genes which have homologues with pneumococcus but not 
with GAS (Table 9); 

GBS Subset 4: 683 GBS genes which do not have homologues with GAS or 
pneumococcus (specific to GBS vs GAS and pneumococcus) (Table 11). 
10 The invention is based on the identification of the following subsets of genes within the 

GAS genome: 

GAS Subset 1: 1006 GAS genes which have homologues with GBS and with 
pneumococcus (Table 33); 

GAS Subset 2: 212 GAS genes which have homologues with GBS but do not have 
15 homologues with pneumococcus (Table 34); 

GAS Subset 3: 62 GAS genes which have homologues with pneumococcus but do not 
have homologues with GBS (Table 35); 

GAS Subset 4: 416 GAS genes which do not have homologues with either GBS or 
pneumococcus. This Subset can be determined by subtracting the above subsets from the 
20 published genome. 

The invention is based on the identification of the following subsets of genes within the 
pneumococcus genome: 

Spn Subset 1: 1034 Spn genes which have homologues with GBS and GAS (Table 36); 

Spn Subset 2: 195 Spn genes which have homologues with GBS but do not have 
25 homologues with GAS (Table 37); 

Spn Subset 3: 74 Spn genes which have homologues with GAS but do not have 
homologues with GBS (Table 38); 

Spn Subset 4: 836 Spn genes which do not have homologues with either GBS or 
pneumococcus. This Subset can be determined by substracting the above Subsets from the 
30 published genome. 

The invention further provides polynucleotides which are conserved or specific to 
Streptococcus based on a comparison with a wide range of published bacterial genomes. The 
following additional Subsets are provided: 
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GBS Subset 1(a): Of the 1060 GBS genes which have homologues in both GAS and 
pneumococcus, 12 of those GBS genes do not have homologues with any of the other published 
bacterial genomes at the time of the invention (i.e., GBS Subset 1(a) is specific to Streptococcus 
vs non Streptococcus published genomes). (The 12 GBS ORFs are listed in Table 3). 
5 GBS Subset 2(a): This Subset comprises GBS genes which have homologues with 

GAS, but not with pneumococcus or any other published bacterial genomes at the time of the 
invention. 

GBS Subset 3(a): This Subset comprises GBS genes which have homologues with 
pneumococcus, but not with GAS or any other published bacterial genomes at the time of the 
10 invention. 

GBS Subset 4(a): Of the 683 GBS genes which do not have homologues in either GAS 
or pnuemococcus, 3 15 of these GBS genes also do not have homologues with any of the other 
published bacterial genomes. These include six proteins predicted to be anchored on the cell 
wall (SAG0677, SAG0771, SAG1052, SAG1331, SAG1473, and SAG1168), three of the 

15 capsule-related genes (SAG1 163, SAG1 167, and SAG1 168), six transcriptional regulators, and 
four genes of the cyl operon (SAG0663 - SAG0673) essential for GBS hemolytic activity and 
production of pigment. See Pritzlaff et al. (2001) Mol Microbiol, 39, 236 - 247. The rest of the 
315 proteins include 240 hypothetical proteins with no similarity to other proteins in databases. 

Many of the 315 genes specific to S. agalactiae are located in regions likely to constitute 

20 mobile genetic elements. Two of these regions resemble prophages (SAG0545-SAG0610 and 
SAG1835-SAG1885) displaying a mosaic structure with segments most similar to different 
bacteriophages, a pattern that suggests frequent recombination events. PblA and PblB are 
adhesins from a S. mitis prophage where they contribute to endocarditis by binding to human 
platelets (See Bensing, et al. (2001) Infect Immun. 69, 6186 - 6192; Bensing, et al (2001) Infect. 

25 Immun. 69, 1373 - 1380. Their orthologs in S. agalactiae are located on separate prophages and 
display a different protein structure. Another region (SAG1247-SAG1299) encodes a putative 
conjugative transposon that carries genes for cadmium efflux and mercury resistance. 

GAS Subset 1(a): This Subset comprises GAS genes which have homologues with GBS 
and with pneumococcus, but do not have homologues with any of the other published bacterial 

30 genomes at the time of the invention. 

GAS Subset 2(a): This Subset comprises GAS genes which have homologues with GBS 
but do not have homologues with pneumococcus or any of the other published bacterial genomes 
at the time of the invention; 
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GAS Subset 3(a): This Subset comprises GAS genes which have homologues with 
pneumococcus but do not have homologues with GBS or any of the other published bacterial 
genomes at the time of the invention. 

GAS Subset 4(a): This Subset comprises GAS genes which do not have homologues 
5 with either GBS or pneumococcus or with any of the other published bacterial genomes at the 
time of the invention. 

Spn Subset 1(a): This Subset comprises Spn genes which have homologues with GBS 
and GAS but which do not have homologues with any of the other published bacterial genomes 
at the time of the invention; 
10 Spn Subset 2(a): This Subset comprises Spn genes which have homologues with GBS 

but do not have homologues with GAS or with any of the other published bacterial genomes at 
the time of the invention; 

Spn Subset 3(a): This Subset comprises Spn genes which have homologues with GAS 
but do not have homologues with GBS or with any of the other published bacterial genomes at 
1 5 the time of the invention; 

Spn Subset 4(a): This Subset comprises Spn genes which do not have homologues with 
either GBS or pneumococcus or with any of the other published bacterial genomes at the time of 
the invention. 

The invention also provides polynucleotides which are conserved or specific to GBS 
20 serotypes and/or clinical isolates. Applicants have sequenced 19 GBS genes from a variety of 

GBS serotypes in 1 1 different clinical isolates. The sequences of these genes and their 

alignments are set forth in Tables 13 — 31. Polynucleotide and polypeptide sequences which are 

specific or conserved across one or more clinical isolates can be identified using these 

alignments. The following additional subsets are provided: 
25 GBS Subset 1(b): of the 1060 GBS genes which have homologues with GAS and with 

pneumococcus, 47 of these GBS genes vary among the 11 clinical isolates (GBS Subset l(b)(i)). 

1013 of these GBS genes are conserved across the 11 clinical isolates (GBS Subset l(b)(ii)). 

These lists can be determined by comparing the genes listed in Table 8 with the Comparative 

Genome Hybridization in Figure 1 . 
30 GBS Subset 2(b): of the 225 GBS genes which have homologues with GAS, but not 

pneumococcus, 44 of these GBS genes vary among the 1 1 clinical isolates (GBS Subset 2(b)(i)). 

181 of these GBS genes are conserved across the 1 1 clinical isolates (GBS Subset 2(b)(ii)). 

These lists can be determined by comparing the genes listed in Table 10 with the Comparative 

Genome Hybridization in Figure 1 . 
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GBS Subset 3(b): of the 176 GBS genes which have homologues with pneumococcus, 
44 of these GBS genes vary among 1 1 clinical isolates (GBS Subset 3(b)(i)). 132 of these GBS 
genes are conserved across the 1 1 clinical isolates (GBS Subset 3(b)(ii)). This list can be 
determined by comparing the genes listed in Table 9 with the Comparative Genome 

5 Hybridization in Figure 1 . 

GBS Subset 4(b): of the 683 GBS genes which do not have homologues with GAS or 
pneumococcus, 260 GBS genes vary among the 1 1 clinical isolates (GBS Subset 4(b)(i)). 423 
of these GBS genes are conserved across the 1 1 clinical isolates (GBS Subset 4(b)(ii)). This list 
can be determined by comparing the genes listed in Table 1 1 with the Comparative Genome 

10 Hybridization in Figure 1 . GBS Subset 4(b)(ii) also includes the GBS ORF's listed on Table 12 
receiving a under the column "GBS specific". 

An additional 63 GBS genes have been sequenced and compared in 2 - 1 1 clinical 
isolates. These sequences and their alignments are provided in Tables 40 - 89. Polynucleotide 
and polypeptide sequences which are specific or conserved across one or more clinical isolates 

1 5 can be identified using these alignments. 

The invention further provides polynucleotides which are likely recent genomic 
duplications in GBS. These duplications include glycosyl transferases, sortases, proteins 
anchored on the cell wall, fi lactam resistance factors, and many hypothetic proteins. The GBS 
genes are listed in Table 4 (GBS Subset 5). 

20 The invention is also based on the identification of a cluster of 1 3 adjacent genes 

(SAG1410 - SAG1424) which is believed to encode enzymes required for synthesis of the group 
B carbohydrate, a coplex multiantennary structure of rhamnose, glucitol phosphate, N- 
acetylglucosamine, and galactose. (GBS Subset 6). Predicted proteins encoded within this 
cluster include seven putative glycoslytransferases, four of which are similar to 

25 rhamnosyltransferases in other streptococcal species; a putative dTDP-L-rhamnose synthase; and 
proteins involved in glucitol synthesis. All nine regonized GBS capsular polysaccharide types 
contain sialic acid residues as part of their repeating unit structure, a feature that contributes to 
virulence by inhibitng activation of the alternative complement pathway. See Edwards et al. 
(1982) J. Immunol 128, 1278 - 1283. 

30 The type V capsular polysaccharide gene cluster consists of 18 genes. (GBS Subset 

6(a)). A region of glycosyltransferases and related proteins (SAG1 1 62 - S AG1 1 70) that direct 
the synthesis of the type V polysaccharide repeat unit is flanked on either side by genes that are 
conserved in all known GBS capsule serotypes. Downstream of this region are genes that 
encode enzynmes for the biosynthesis and activation of sialic acid (SAG1 158 - SAG1 161). 
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Upstream of the serotype specific region are genes (SAG1 171 - SAG1 175) found not only in all 
nine GBS capsular serotypes but also in a variety of other polysaccharide-producing 
streptococci. 

The invention is also based on the identification of GBS ORFs predicted to encode 
5 proteins carrying a signal peptide (GBS Subset 7). These GBS ORF's are listed in Table 2 
receiving a "4-" under the column "signal peptide". 

The invention is also based on the identification of GBS ORFs predicted to encode 
proteins which are anchored on the cell wall through an LPxTG motif (GBS Subset 8). These 
GBS ORF's are listed in Table 2 receiving a under the column "sortase motif. 
10 The invention is also based on the identification of GBS ORFs prediced to encode 

lipoproteins (GBS Subset 9). These GBS ORF's are listed in Table 2 receiving a under the 
column "lipoprotein". 

The invention is also based on the identification of two GBS ORF's predicted to encode 
enzymes related to metabolism (GBS Subset 10). These GBS ORFs include a putative 
15 pullulanase (SAG1216) and a neuraminidase-related protein (SAG1932). 

The invention is also based on the identification of GBS ORF's predicted to encode 
proteins exposed on the cell surface (GBS Subset 11). These GBS ORF's are listed in Table 2 
receiving a "+" under the column "FACS". 

The invention is also based on the identification of 401 GBS ORF's from GBS strain 
20 2603 V/R which were not detected in at least one other of the 1 1 tested clinical isolates (GBS 
Subset 12). See Comparative Hybridization Genome in Figure 1. 364 of these 401 ORF's 
correspond to 15 regions containing more than 5 contiguous genes. Each region is identified in 
Figure 1 by numerical yellow bullets. Each region comprises a subset as defined below: 

Region 1: GBS Subset 12(a). This region is unique to GBS (SAG021 8 - SAG023 8). 
25 This region is a possible plasmid or remnant of a phage and contains mostly hypothetical 
proteins. 

Region 2: GBS Subset 12(b) 

Region 3: GBS Subset 12(c) 

Region 4: GBS Subset 12(d) 
30 RegionS: GBS Subset 12(e) 

Region 6: GBS Subset 12(f) 

Region 7: GBS Subset 12(g) 
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Region 8: GBS Subset 12(h). This region is specific to GBS (SAG1018 - SAG1037). 
This regioncomprises 20 proteins of unknown function, most of which are predicted to be 
membrane associated or secreted, and displays an atypical nucleotide composition. 



10 (SAG1989 - 2021), including 25 proteins of unknown function, some of which carry a cell-wall 
anchor. 

Region 15: GBS Subset 12(o). 

This invention is also based on identification of clusters of GBS genes as set forth in 
Figure 5 and Table 6. In Figure 5, the presence of a particular gene or gene cluster is indicated in 

15 the figure by a red square and the absence of a gene or cluster by a black square. The 

relationship between strains based on this analysis is depicted by the tree at the top of the figure. 
The strains and their serotypes are indicated (NT: nontypeable). Clusters with identical profiles 
are reduced to a single horizontal line and the number of genes in each cluster is indicated on the 
right. The clusters of 5 or more genes, labeled in red text and numbered, are listed in Table 6. 

20 The 1698 genes shared by all 19 strains are labeled in green text. Applicants identified the 



5 



Region 9: GBS Subset 12(i) 
Region 10: GBS Subset 120') 
Region 11: GBS Subset 12(k) 
Region 12: GBS Subset 12(1) 
Region 13: GBS Subset 12(m) 



Region 14: GBS Subset 12(n). This region is unique to GBS and spans 33 genes 



following subsets: 



30 



25 



GBS Subset 13 (a): Cluster 1 (from Table 6). 
GBS Subset 13 (b): Cluster 2 (from Table 6). 
GBS Subset 13 (c): Cluster 3 (from Table 6). 
GBS Subset 13 (d): Cluster 4 (from Table 6). 
GBS Subset 13 (e): Cluster 5 (from Table 6). 
GBS Subset 13 (f): Cluster 6 (from Table 6). 
GBS Subset 13 (g): Cluster 7 (from Table 6). 
GBS Subset 13 (h): Cluster 8 (from Table 6). 
GBS Subset 13 (i): Cluster 9 (from Table 6). 
GBS Subset 13 0'): Cluster 10 (from Table 6). 
GBS Subset 13 (k): Cluster 11 (from Table 6). 
GBS Subset 13 (1): Cluster 12 (from Table 6). 
GBS Subset 13 (m): Cluster 13 (from Table 6). 
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GBS Subset 13 (n): Cluster 14 (from Table 6). 

GBS Subset 13 (o): Cluster 15 (from Table 6). 

GBS Subset 13 (p): Cluster 16 (from Table 6). 

GBS Subset 13 (q): 1698 ORFs shared by all strains. 
5 The invention is also based on the identification of the polynucleotide sequences of 82 

genes from up to 1 1 different GBS strains. 19 of these genes are listed on Table 7. A further 
GBS Subset 14 includes this set of polynucleotide sequences from the 1 1 strains and their 
encoded polypeptide sequences. In particular, GBS Subset 14 contains a Subset of 
polynucleotide fragments of 10 or more contiguous polynucleotides which are conserved 
10 between two or more strains (GBS Subset 14(a)). GBS Subset 14 further includes a Subset of 
polynucleotide fragments of 15 or more contiguous polynucleotides which are conserved 
between two or more strains (GBS Subset 14(b)). GBS Subset 14 further includes a Subset of 
polynucleotide fragments of 10 or more contiguous polynucleotides which are conserved 
between three or more strains (GBS Subset 14(c)). GBS Subset 14 further includes a Subset of 
15 polynucleotide fragments of 10 or more contiguous polynucleotides which are conserved 
between four or more strains (GBS Subset 14(d)). 

GBS Subset 14 further includes a Subset of polypeptide fragments of 5 or more 
contiguous amino acids which are conserved between in two or more strains (GBS Subset 
14(e)). GBS Subset 14 further includes a Subset of polypeptide fragments of 5 or more 
20 contigous amino acids which are conserved between three or more strains (GBS Subset 14(f)). 
GBS Subset 14 further includes a Subset of polypeptide fragments of 5 or more contiguous 
amino acids which are conserved between four or more strains (GBS Subset 14(g)). GBS 
Subset 14 further includes a Subset of polypeptide fragments of 10 or more contiguous amino 
acids which are conserved across two or more strains (GBS Subset 14(h)). 
25 The invention provides for methods of screening a Streptococcal genome for a conserved 

or a specific genomic sequence using one or more of the Subsets of the invention. 

The invention further provides for an immunogenic composition comprising a 
polypeptide expressed by one or more of the polynucleotides in one or more of the Subsets of the 
invention, and methods for designing an immunogenic composition by selecting one or more 
30 polypeptides expressed by one or more of the polynucleotides in one or more of the Subsets of 
the invention. Preferably, the immunogenic compositions of the invention comprise at least two, 
three, four or five polypeptides encoded by polynucleotides within the same Subset. 
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The invention further provides for methods of screening compounds for activity against a 
Streptococcal bacteria, which method comprises contacting the compounds with a polypeptide 
expressed by the polynucleotide from one of the Subsets of the invention. 

The invention further provides for compositions comprising one or more of the 
5 polynucleotides, and fragments thereof, selected from the group consisting of the sequences set 
forth in Tables 13 - 3 1 or 40 - 89. 

The invention further provides for compositions comprising polypeptides and fragments 
thereof encoded by the polynucleotides set forth in Tables 13-31 or 40 -89. 

The invention provides for compositions comprising polypeptides and fragments thereof 
1 0 set forth in Tables 1 3 - 3 1 or 40 -89. 



BRIEF DESCRIPTION OF THE TABLES AND DRAWINGS 

Table 1 comprises a complete list of GBS predicted genes, listed by SAGxxxx ORF 

number. The SAGxxxx ORF number corresponds to the genomic sequence for the 
15 Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by August 

28, 2002 at http://www.tigr.orR or at the GenBank database at accession number AE009948. 

This table also includes the predicted amino acid size of the predicted expressed protein and the 

predicted function, if known. 

Table 2 comprises a list of predicted and experimentally characterized surface and 
20 secreted proteins from GBS. The SAGxxxx ORF number corresponds to the genomic sequence 

for the Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by 

August 28, 2002 at http://www.tigr.org or at the GenBank database at accession number 

AE009948. 

Table 3 lists GBS genes which were shared among GBS, GAS and pneumococcus, but 
25 which were not found in any of the other completely sequenced genomes. The SAGxxxx ORF 
number corresponds to the genomic sequence for the Streptococcus agalactiae type V strain 2603 
V/R available either at the TIGR website by August 28, 2002 at http://www.tigr.org or at the 
GenBank database at accession number AE009948. 

Table 4 depicts GBS genes which are predicted to have been recently duplicated within 
30 the genome. The SAGxxxx ORF number corresponds to the genomic sequence for the 

Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by August 
28, 2002 at http ://www.ti isx.org or at the GenBank database at accession number AE009948. 

Table 5 lists the 19 GBS strains used for comparative genome hybridisations and 
phylogenetic analysis. 
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Table 6 lists clusters of GBS genes derived from phylogenetic profiling of GBS strains 
based on comparative genome hybridisations. The SAGxxxx ORF number corresponds to the 
genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 
TIGR website by August 28, 2002 at http://www.ti gr.org or at the GenBank database at 
accession number AE009948. 

Table 7 lists the GBS genes used for phylogenetic analyses of the 19 GBS strains. The 
SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus agalactiae 
type V strain 2603 V/R available either at the TIGR website by August 28, 2002 
http://www.tigr.org or at the GenBank database at accession number AE009948. 

Table 8 lists the 1060 GBS ORF's which are shared with GAS and pneumococcus. The 
ORFxxxxx reference number can be translated to SAGxxxx ORF number by using Table 32. 
The SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus 
agalactiae type V strain 2603 V/R available either at the TIGR website by August 28, 2002 at 
http://www.tigr.org or at the GenBank database at accession number AE009948. 1 

Table 9 lists the 176 GBS ORF's which are shared with pneumococcus but which are not 
homologous to a GAS gene. The ORFxxxxx reference number can be translated to SAGxxxx 
ORF number by using Table 32. The SAGxxxx ORF number corresponds to the genomic 
sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR 
website by August 28, 2002 at http://www.tigr.org or at the GenBank database at accession 
number AE009948. 

Table 10 lists the 225 GBS ORF's which are shared with GAS but which are not 
homologous with a pnuemococcus gene. The ORFxxxxx reference number can be translated to 
SAGxxxx ORF number by using Table 32. The SAGxxxx ORF number corresponds to the 
genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 
TIGR website by August 28, 2002 at http://www.tigr.org or at the GenBank database at 
accession number AE009948. 

Table 1 1 lists 683 GBS ORF's which are not shared with either GAS or pneumococcus. 
The ORFxxxxx reference number can be translated to SAGxxxx ORF number by using Table 32. 
The SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus 
agalactiae type V strain 2603 V/R available either at the TIGR website by August 28, 2002 at 
http://www.tigr.org or at the GenBank database at accession number AE009948. 

Table 12 lists 315 GBS ORF's which are not shared with GAS, pneumococcus or any 
other published genomic sequence. The ORFxxxxx reference number can be translated to 
SAGxxxx ORF number by using Table 32. The SAGxxxx ORF number corresponds to the 
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genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 
TIGR website by August 28, 2002 at http://www.tigr.org or at the GenBank database at 
accession number AE009948. 

Table 13 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG0466. An alignment of each of the sequences is also included. 

Table 14 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG0471. An alignment of each of the sequences is also included. 

Table 15 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG0492. An alignment of each of the sequences is also included. 

Table 16 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG0767. An alignment of each of the sequences is also included. 

Table 17 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1086. An alignment of each of the sequences is also included. 

Table 18 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1600. An alignment of each of the sequences is also included. 

Table 19 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1680. An alignment of each of the sequences is also included. 

Table 20 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1723. An alignment of each of the sequences is also included. 

Table 21 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0079. An alignment of each of the sequences is also included. 

Table 22 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0093. An alignment of each of the sequences is also included. 

Table 23 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAGO 163. An alignment of each of the sequences is also included. 

Table 24 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0290. An alignment of each of the sequences is also included. 

Table 25 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0368. An alignment of each of the sequences is also included. 

Table 26 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0503. An alignment of each of the sequences is also included. 

Table 27 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG1473. An alignment of each of the sequences is also included. 
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Table 28 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG1552. An alignment of each of the sequences is also included. 

Table 29 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG 1641. An alignment of each of the sequences is also included. 
5 Table 30 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 

GBS ORF SAG2147. An alignment of each of the sequences is also included. 

Table 31 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG2148. An alignment of each of the sequences is also included. 

Table 32 provides a conversion table for the ORFxxxx reference numbers to the 
10 SAGxxxx reference numbers. The SAGxxxx ORF number corresponds to the genomic sequence 
for the Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by 
August 28, 2002 at http://www.tigr.org or at the GenBank database at accession number 
AE009948. 

Table 33 lists the 1006 GAS ORF's which are shared with GBS and Spn. The sequences 
15 corresponding to these ORFs were published in GenBank, Accession No. AAK33146 (protein 
sequence). A link to the corresponding polynucleotide sequence is also available. The numbers 
for the GAS ORF refer directly to their GenBank entries. 

Table 34 lists the 212 GAS ORF's which are shared with GBS but which do not have 
homologues with pneumococcus. The sequences corresponding to these ORFs were published in 
20 GenBank, Accession No. AAK33 146 (protein sequence). A link to the corresponding 

polynucleotide sequence is also available. The numbers for the GAS ORF refer directly to their 
GenBank entries. 

Table 35 lists the 62 GAS ORF's which have homologues with pneumococcus but which 
do not have homologues with GBS. The sequences corresponding to these ORFs were published 
25 in GenBank, Accession No. AAK33146 (protein sequence). A link to the corresponding 

polynucleotide sequence is also available. The numbers for the GAS ORF refer directly to their 
GenBank entries. 

Table 36 lists the 1034 Spn ORF's which are shared with GBS and GAS. These ORF's 
were published in GenBank. The numbers for Spn correspond to the entry for AE005672. 
30 Table 37 lists the 195 Spn ORF's which are shared with GBS but do not have 

homologues with GAS. These ORF's were published in GenBank. The numbers for Spn 
correspond to the entry for AE005672. 
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Table 38 lists the 74 Spn ORF's which are shared with GAS but do not have homologues 
withGBS. These ORF's were published in GenBank. The numbers for Spn correspond to the 
entry for AE005672. 

Table 40 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
5 ORF SAG0635. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 41 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF SAG0649. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 42 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0764. An alignment of the polynucleotide and polypeptide sequences is also included. 
10 Table 43 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0079. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 44 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0416. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 45 lists the polynucleotide and polypeptide sequences of 5 strains relating to GBS 
15 ORF SAG1404. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 46 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1615. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 47 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0739. An alignment of the polynucleotide and polypeptide sequences is also included. 
20 Table 48 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG1474. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 49 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG 15 02. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 50 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
25 ORF SAG1024. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 51 lists the polynucleotide and polypeptide sequences of 7 strains relating to GBS 
ORF SAG0677. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 52 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1823. An alignment of the polynucleotide and polypeptide sequences is also included. 
30 Table 53 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0755. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 54 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0949. An alignment of the polynucleotide and polypeptide sequences is also included. 
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Table 55 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG 1592. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 56 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0806. An alignment of the polynucleotide and polypeptide sequences is also included. 
5 Table 57 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 

ORF SAG1488. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 58 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAGO 182. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 59 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
10 ORF SAG2147. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 60 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG 1945. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 61 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF SAG 1030. An alignment of the polynucleotide and polypeptide sequences is also included. 
15 Table 62 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0690. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 63 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1912. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 64 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
20 ORF SAG0827. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 65 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF SAG0231. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 66 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0754. An alignment of the polynucleotide and polypeptide sequences is also included. 
25 Table 67 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0475. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 68 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0499. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 69 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
30 ORF SAG0032. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 70 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF SAG1280. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 71 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1333. An alignment of the polynucleotide and polypeptide sequences is also included. 
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Table 72 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0941 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 73 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0981 . An alignment of the polynucleotide and polypeptide sequences is also included. 
5 Table 74 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG1572. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 75 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0671 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 76 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
10 ORF SAG0260. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 77 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG2059. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 78 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1016. An alignment of the polynucleotide and polypeptide sequences is also included. 
15 Table 79 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG2150. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 80 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF SAG1266. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 81 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
20 ORF SAGOO 1 1 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 82 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAGO 165. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 83 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAGO 108. An alignment of the polynucleotide and polypeptide sequences is also included. 
25 Table 84 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0267. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 85 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1361. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 86 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
30 ORF SAG1393. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 87 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF SAG0645. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 88 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0477. An alignment of the polynucleotide and polypeptide sequences is also included. 
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Table 89 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1350. An alignment of the polynucleotide and polypeptide sequences is also included. 

Figure 1 is a circular representation of the GBS genome and comparative hybridisations 
using microarrays. A color version of Figure 1 can be found in Tettelin et al., PNAS (2002) 
5 99(19): 12391 - 12396 and online at www.pnas.org. 

Figure 2 is a schematic representation of in silico comparisons between streptococci. A 
color version of Figure 2 can be found in Tettelin et al., PNAS (2002) 99(19): 12391 - 12396 
and online at www.pnas.org. 

Figure 3 depicts a phylo genetic tree of GBS strains based on PGR sequences. 
10 Figure 4 depicts a linear representation of the GBS genome. A color version of Figure 4 

can be found in the supporting information to Tettelin et al., PNAS (2002) 99(19): 12391 - 
12396 available online at www.pnas.org. 

Figure 5 demonstrates phylogenetic profiling of GBS strains based on comparative 
genome hybridisations. A color version of Figure 5 can be found in the supporting information 
15 to Tettelin et al., PNAS (2002) 99(19): 12391 - 12396 available online at www.pnas.org. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. In particular, 
20 the invention relates to polynucleotides from Streptococcus which are conserved or specific to 
one or more of the species of S. pneumoniae ("pneumococcus" or "S. pn."), S. pyogenes ("group 
A streptococcus" or "GAS"), and S. agalactiae ("group B streptococcus" or "GBS"). The 
invention further relates to polynucleotides which are conserved or specific to one or more 
Streptococcal species serotypes, such as GBS serotypes la, lb, II, III, IV, V, VI, VII, and VIII. 
25 The invention still further relates to polynucleotides which are conserved or specific to one or 
more clinical isolates of a Streptococcus species. 

In order to facilitate an understanding of the invention, selected terms used in the 
application will be discussed below. 

As used herein, the phrase " species of Streptococcus " generally refers to species of the 
30 Streptoccus family, including S.pneumoniae ("pneumococcus" or "S.pn."), S.pyogenes ('group A 
streptococcus' or 'GAS') and S.agalactiae ('group B streptococcus' or 'GBS'). 

As used herein, the phrase " Streptococcus species serotypes " generally refers to 
subdivisions based on a distinguishing characteristic within a specific Streptococcus species. 
The distinguishing characteristic can be identified by any of a wide range of diagnostic tools. 



WO 2004/018646 



PCT/US2003/026827 



For instance, GBS is generally recognized as comprising at least nine subdividing serotypes 
based on the structure of their polysaccharide capsule. 

As used herein, the phrases " serotype isolates " or " clinical isolates " generally refer to 
specific isolated bacterial strains of a specific Streptococcal species and serotype. 
5 As used herein in reference to bacterial genomes, the phrases " conserved " or " shared " 

generally refer to genomic sequences which have homologues in the two or more genomes in the 
reference. Homology references, as used in this application, are generally based on comparisons 
using FASTA3. See Pearson (2000)Methods Mol. Biol. 132 185-219. When the homology 
reference involves a comparison between genes in GBS, GAS or Spn, homologous or shared 
10 genes are typically defined by using a FASTA3 P value cutoff of 1 0" 15 . Where the homology 
reference involves a comparison between GBS, GAS or Spn and all other completely sequenced 
genomes, homologous or shared genes are typically defined by using a FASTA3 P value cutoff 
of 10" 5 or lower. 

As used herein in reference to bacterial genomes, the phrases "specific to" or "not shared" 

15 generally refer to genomic sequences which do not have homologues in the two or more 
genomes in the reference. 

Other software programs to compare identity and to determine homology between 
nucleotide sequences are known in the art, for example those described in section 7.7.18 of 
Current Protocols in Molecular Biology (F.M. Ausubel et al 9 eds., 1987) Supplement 30. A 

20 preferred alignment program is GCG Gap (Genetics Computer Group, Wisconsin, Suite Version 
10.1), preferably using default parameters, which are as follows: open gap = 3; extend gap = 1. 

Sequences within a Subset of the invention include sequences which hybridize to the 
listed genes. Hybridization reactions can be performed under conditions of different 
"stringency". Conditions that increase stringency of a hybridization reaction of widely known 

25 and published in the art [e.g. page 7.52 of Sambrook et al. (1989) Molecular Cloning: A 

Laboratory Manual. NY, Cold Spring Harbor Laboratory]. Examples of relevant conditions 
include (in order of increasing stringency): incubation temperatures of 25°C, 37°C, 50°C, 55°C 
and 68°C; buffer concentrations of 10 x SSC, 6 x SSC, 1 x SSC, 0.1 x SSC (where SSC is 
0.15 M NaCl and 15 mM citrate buffer) and their equivalents using other buffer systems; 

30 formamide concentrations of 0%, 25%, 50%, and 75%; incubation times from 5 minutes to 24 
hours; 1, 2, or more washing steps; wash incubation times of 1, 2, or 15 minutes; and wash 
solutions of 6 x SSC, 1 x SSC, 0.1 x SSC, or de-ionized water. Hybridization techniques and 
their optimization are well known in the art [e.g. see Sambrook et al; RNA Methodologies 
(Farrell, 1998) (Academic Press; ISBN 0-12-249695-7); Current Protocols in Molecular Biology 
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(F.M. Ausubel et ah, eds., 1987) Supplement 30; Short protocols in molecular biology (4th 
edition, 1999) Ausubel et al eds. ISBN 0-471-32938-X; US patent 5,707,829 etc.]. 

Identity between polypeptide sequences can be determined using software programs 
known in the art, for example those described in section 7.7.18 of Current Protocols in 
5 Molecular Biology (F.M. Ausubel et al., eds., 1987) Supplement 30. A preferred alignment is 
determined by the Smith- Waterman homology search algorithm [Smith & Waterman (1981) Adv. 
Appl Math 2: 482-489.] using an affine gap search with a gap open penalty of 12 and a gap 
extension penalty of 2, BLOSUM matrix 62. 

Typically, 50% identity or more between two proteins may be considered to be an 

10 indication of functional equivalence. References to a percentage sequence identity between two 
amino acid sequences means that, when aligned, that percentage of amino acids are the same in 
comparing the two sequences. 

The terms " polypeptide ", " protein " and " amino acid sequence " as used herein generally 
refer to a polymer of amino acid residues and are not limited to a minimum length of the product. 

15 Thus, peptides, oligopeptides, dimers, mulimers, and the like, are included within the definition. 
Both full-length proteins and fragments thereof are encompassed by the definition. Minimum 
fragments of polypeptides useful in the invention can be at least 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 
14, 15, 18, 20, 25, 30, 35, 40 or 50 amino acids. Typically, polypeptides useful in this invention 
can have a maximum length suitable for the intended application. Generally, the maximum 

20 length is not critical and can easily be selected by one skilled in the art. 

Reference to polypeptides and the like also includes derivatives of the amino acid 
sequences of the invention. Such derivatives can include postexpression modifications of the 
polypeptide, for example, glycosylation, acetylation, phosphorylation, and the like. Amino acid 
derivatives can also include modifications to the native sequence, such as deletions, additions 

25 and substitutions (generally conservative in nature), so long as the protein maintains the desired 
activity. These modifications may be deliberate, as through site-directed mutagenesis, or may be 
accidental, such as through mutations of hosts which produce the proteins or errors due to PGR 
amplification. Furthermore, modifications may be made that have one or more of the following 
effects: reducing toxicity; facilitating cell processing (e.g., secretion, antigen presentation, etc.); 

30 and facilitating presentation to B-cells and/or T-cells. 

A " recombinant " protein is a protein which has been prepared by recombinant DNA 
techniques as described herein. In general, the gene of interest is cloned and then expressed in 
transformed organisms, as described further below. The host organism expressed the foreign 

20 
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gene to produce the protein under expression conditions. The polypeptides of the invention may 
be prepared by recombinant means. 

The term " polynucleotide " . as known in the art, generally refers to a nucleic acid 
molecule. A "polynucleotide" can include both double- and single-stranded sequences and refers 
5 to, but is not limited to, cDNA from viral, prokaryotic or eukaryotic MRNA, genomic RNA and 
DNA sequences from viral (e.g. RNA and DNA viruses and retroviruses) or prokaryotic DNA, 
and especially synthetic DNA sequences. The term also captures sequences that include any of 
the known base analogs of DNA and RNA, and includes modifications such as deletions, 
additions and substitutions (generally conservative in nature), to the native sequence, so long as 
10 the nucleic acid molecule encodes a therapeutic or antigenic protein. These modifications may 
be deliberate, as through site-directed mutagenesis, or may be accidental, such as through 
mutations of hosts that produce the antigens. Modifications of polynucleotides may have any 
number of effects including, for example, facilitating expression of the polypeptide product in a 
host cell. 

15 The term "polynucleotide" further includes DNA, RNA, DNA/RNA hybrids, DNA and 

RNA analogues such as those containing modified backbones (with modifications in the sugar 
and/or phosphates e.g. phosphorothioates, phosphoramidites etc.), and also peptide nucleic acids 
(PNA) and any other polymer comprising purine and pyrimidine bases or other natural, 
chemically or biochemically modified, non-natural, or derivatized nucleotide bases etc. Nucleic 

20 acid according to the invention can be prepared in many ways {e.g. by chemical synthesis, from 
genomic or cDNA libraries, from the organism itself etc.) and can take various forms {e.g. single 
stranded, double stranded, vectors, probes etc.). 

A polynucleotide can encode a biologically active (e.g., immunogenic or therapeutic) 
protein or polypeptide. Depending on the nature of the polypeptide encoded by the 

25 polynucleotide, a polynucleotide can include as little as 10 nucleotides, e.g., where the 

polynucleotide encodes an antigen. The polynucleotides of the invention may comprise at least 
10, 13, 15, 18, 20, 22, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 80, 90 or 100 consecutive 
polynucleotides . 

By " isolated " is meant, when referring to a polynucleotide or a polypeptide, that the 
30 indicated molecule is separate and discrete from the whole organism with which the molecule is 
found in nature or, when the polynucleotide or polypeptide is not found in nature, is sufficiently 
free of other biological macromolecules so that the polynucleotide or polypeptide can be used for 
its intended purpose. 
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" Antibody" as known in the art includes one or more biological moieties that, through 
chemical or physical means, can bind to or associate with an epitope of a polypeptide of interest. 
The antibodies of the invention specifically bind to infectious prion conformations. The term 
"antibody" includes antibodies obtained from both polyclonal and monoclonal preparations, as 

5 well as the following: hybrid (chimeric) antibody molecules (see, for example, Winter et al. 
(1991) Nature 349: 293-299; and U.S. Patent No. 4,816,567; F(ab') 2 and F(ab) fragments; F v 
molecules (non-covalent heterodimers, see, for example, Inbar et al. (1972) Proc Natl Acad Sci 
USA 69:2659-2662; and Ehrlich et al. (1980) Biochem 19:4091-4096); single-chain Fv molecules 
(sFv) (see, for example, Huston et al. (1988) Proc Natl Acad Sci USA 85:5897-5883); dimeric 

10 and trimeric antibody fragment constructs; minibodies (see, e.g., Pack et al. (1992) Biochem 
31:1579-1584; Cumber et al. (1992) J Immunology 149B : 120-126); humanized antibody 
molecules (see, for example, Riechmann et al. (1988) Nature 332:323-327; Verhoeyan et al. 
(1988) Science 239:1534-1536; and U.K. Patent Publication No. GB 2,276,169, published 21 
September 1994); and, any functional fragments obtained from such molecules, wherein such 

15 fragments retain immunological binding properties of the parent antibody molecule. The term 
"antibody" further includes antibodies obtained through non-conventional processes, such as 
phage display. 

As used herein, the term "monoclonal antibody " refers to an antibody composition 
having a homogeneous antibody population. The term is not limited regarding the species or 

20 source of the antibody, nor is it intended to be limited by the manner in which it is made. Thus, 
the term encompasses antibodies obtained from murine hybridomas, as well as human 
monoclonal antibodies obtained using human rather than murine hybridomas. See, e.g., Cote, et 
al. Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, 1985, p 77. 

An " immunogenic composition " as used herein refers to a composition that comprises an 

25 antigenic molecule where administration of the composition to a subject results in the 

development in the subject of a humoral and/or a cellular immune response to the antigenic 
molecule of interest. The immunogenicity of the composition or the antigenicity of the molecule 
may be facilitated by the use of an adjuvant. 

The practice of the present invention will employ, unless otherwise indicated, 

30 conventional methods of chemistry, biochemistry, molecular biology, immunology and 

pharmacology, within the skill of the art. Such techniques are explained fully in the literature. 
See, e.g., Remington's Pharmaceutical Sciences, 18th Edition (Easton, Pennsylvania: Mack 
Publishing Company, 1990); Methods In Enzymology (S. Colowick and N. Kaplan, eds., 
Academic Press, Inc.); and Handbook of Experimental Immunology, Vols. I-IV (D.M. Weir and 
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C.C. Blackwell, eds., 1986, Blackwell Scientific Publications); Sambrook, et aL, Molecular 
Cloning: A Laboratory Manual (2nd Edition, 1989); Handbook of Surface and Colloidal 
Chemistry (Birdi, K.S. ed., CRC Press, 1997); Short Protocols in Molecular Biology, 4th ed. 
(Ausubel et aL eds., 1999, John Wiley & Sons); Molecular Biology Techniques: An Intensive 
5 Laboratory Course, (Ream et aL, eds., 1998, Academic Press); PCR (Introduction to 

Biotechniques Series), 2nd ed. (Newton & Graham eds., 1997, Springer Verlag); Peters and 
Dalrymple, Fields Virology (2d ed), Fields et al. (eds.), B.N. Raven Press, New York, NY. 

It is understood that the antibodies and methods of this invention are not limited to 
particular formulations or process parameters as such may, of course, vary. It is also to be 
10 understood that the terminology used herein is for the purpose of describing particular 
embodiments of the invention only, and is not intended to be limiting. 

All publications, patents and patent applications cited herein are hereby incorporated by 
reference in their entirety. 

15 Vaccines and Immunisation 

The invention provides an immunogenic composition comprising a polypeptide, or a 
fragment thereof, which is encoded by a polynucleotide sequence which is conserved across one 
or more species of Streptococcus. 

The polynucleotide is preferably conserved across one or more species of Streptococcus 
20 selected from the group consisting of GBS, GAS and pneumococcus. In one embodiment, the 
polynucleotide is a GBS polynucleotide which is homologous with at least one gene from both 
GAS and pneumococcus. Preferably, the GBS polynucleotide is selected from GBS Subset 1, 
which includes 1060 GBS genes which have homologues with both GAS and pneumococcus 
(Table 8). 

25 In another embodiment, the polynucleotide is a GAS polynucleotide which is 

homologous with at least one gene from both GBS and pneumococcus. Preferably, the GAS 
polynucleotide is selected from GAS Subset 1, which includes 1006 GAS genes which have 
homologues with both GBS and pneumococcus. 

In another embodiment, the polynucleotide is a pneumococcal polynucleotide which is 

30 homologous with at least one gene both GAS and GBS. Preferably, the pneumococcus 

polynucleotide is selected from Spn Subset 1, which includes 1034 pneumococcal genes which 
have homologous with both GBS and GAS. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from GAS. Preferably, the polynucleotide is selected from 
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one of the genes listed GBS Subset 2, which includes 225 GBS genes which have homologues 
with GAS, but not with pneumococcus. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from pneumococcus. Preferably, the polynucleotide is 
5 selected from GBS Subset 3, which includes 176 GBS genes which have homologues with 
pneumococcus. 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from GBS. Preferably, the polynucleotide is selected from 
GAS Subset 2, which includes 212 GAS genes which have a homologue with GBS. 
10 In another embodiment, the polynucleotide is a GAS polynucleotide which is 

homologous with at least one gene from pneumoccus. Preferably, the polynucleotide is selected 
from GAS Subset 3, which includes 62 GAS genes which have a homologue with 
pneumococcus. 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 

15 homologous with at least one gene from GBS. Preferably, the polynucleotide is selected from 
Spn Subset 2, which includes 195 Spn genes which have a homologue with GBS. 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 
homologous with at least one gene from GAS. Preferably, the polynucleotide is selected from 
Spn Subset 3, which includes 74 Spn genes which have a homologue with GAS. 

20 The invention further provides an immunogenic composition comprising a polypeptide, 

or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to one or 
more species of Streptococcus. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide which is specific to GBS, GAS and 

25 pneumococcus. In one embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from both GAS and pneumococcus. Preferably, the GBS 
polynucleotide is selected from GBS Subset 1. In an alternative embodiment, the polynucleotide 
is a GBS polynucleotide which is homologous to at least one gene from both GAS and 
pneumococcus, but which is not homologous to a gene in any other published bacterial genome 

30 at the time of the invention. Preferably, the GBS polynucleotide is selected from one of the 12 
GBS genes included in GBS Subset 1(a). (Table 3). 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous to at least one gene in both GBS and pneumococcus. Preferably, the GAS 
polynucleotide is selected from GAS Subset 1 . In another embodiment, the polynucleotide is a 
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GAS polynucleotide which is homologous to at least one gene in both GBS and pneumococcus 
but which is not homologous to any gene in any other published bacterial genome at the time of 
the invention. Preferably, the GAS polynucleotide is selected from GAS Subset 1(a). 

Alternatively, the polynucleotide is a pneumoccus polynucleotide which is homologous 
5 to at least one gene in both GBS and GAS. Preferably, the pneumococcus polynucleotide is 
selected from Spn Subset 1(a). In another embodiment, the polynucleotide is a pneumoccus 
polynucleotide which is homologous to at least one gene in both GBS and GAS but which does 
not have a homologue in any other published bacterial genome at the time of the invention. 
Preferably, the pneumococcus polynucleotide is selected from Spn Subset 1(a). 

10 The invention further provides an immunogenic composition comprising a polypeptide, 

or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS. 
In one embodiment, the polynucleotide is a GBS polynucleotide which is not homologue to a 
gene in either GAS or pneumococcus. Preferably, the GBS polynucleotide is selected from one 
of the 683 GBS genes included in GBS Subset 4. In a further embodiment, the polynucleotide is 

15 a GBS polynucleotide which is not homologous to a gene in either GAS or pneumococcus or any 
other published bacterial genome at the time of the invention. Preferably, the GBS 
polynucleotide is selected from one of the 315 GBS genes in GBS Subset 4(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GAS. 

20 In one embodiment, the polynucleotide is a GAS polynucleotide which is not homologous to a 
gene in either GBS or pneumococcus. Preferably, the GBS polynucleotide is selected from one 
of the 416 GAS genes included in GAS Subset 4. In a further embodiment, the polynucleotide is 
a GAS polynucleotide which does not have a homologue in either GBS or pneumococcus or in 
any other published bacterial genome at the time of the invention. Preferably, the GAS 

25 polynucleotide is selected from GAS Subset 4(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to 
pneumococcus. In one embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is not homologous to a gene in either GBS or GAS. Preferably, the pneumococcus 

30 polynucleotide is selected from one of the 836 Spn genes included in Spn Subset 4. In a further 
embodiment, the polynucleotide is a pneumococcus polynucleotide which does not have a 
homologue in either GBS or GAS or in any other published bacterial genome at the time of the 
invention. Preferably, the pneumococcus polynucleotide is selected from Spn Subset 4(a). 
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The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS 
and GAS. In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous 
to at least one gene from GAS but is not homologous to a gene from pneumococcus. Preferably, 
5 the GBS polynucleotide is selected from one of the 225 GBS genes included in GBS Subset 2. 
In another embodiment, the GBS polynucleotide is homologous to at least one gene from GAS 
but is not homologous to any gene from pneumococcus and does not have a homologue in any 
other published bacterial genome at the time of the invention. Preferably, the GBS 
polynucleotide is selected from GBS Subset 2(a). 

10 In another embodiment, the polynucleotide is a GAS polynucleotide which is 

homologous to at least one gene from GBS but is not homologous to any gene from 
pneumococcus. Preferably, the GAS polynucleotide is selected from one of the 212 GAS genes 
included in GAS Subset 2. In another embodiment, the GAS polynucleotide is homologous to at 
least one gene from GBS but is not homologous to any gene from pneumococcus and does not 

15 have a homologous gene with any other published bacterial genome at the time of the invention. 
Preferably, the GAS polynucleotide is a selected from GAS Subset 2(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS 
and pneumococcus. In one embodiment, the polynucleotide is a GBS polynucleotide which is 

20 homologous to at least one gene from pneumococcus but is not homologous to any gene from 
GAS. Preferably, the GBS polynucleotide is selected from one of the 176 GBS genes included 
in GBS Subset 3. In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from pneumococcus but is not homologous with any GAS 
polynucleotide and does not have a homologous gene in any of the other published bacterial 

25 genomes at the time of the invention. Preferably, the GBS polynucleotide is selected from GBS 
Subset 3(a). 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 
homologous with at least one gene from GBS, but is not homologous with any gene from GAS. 
Preferably, the pneumoccous polynucleotide is selected from one of the 195 Spn genes included 
30 in Spn Subset 2. In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one gene from GBS, but is not homologous with any gene 
from GAS and does not have a homologous gene in any other published bacterial genome at the 
time of the invention. Preferably, the pneumococcus polynucleotide is selected from Spn Subset 
3(a). 
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The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof which is encoded by a polynucleotide sequence which is specific to GAS 
and pneumococcus. In one embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from pneumococcus but is not homologous with any gene 
5 from GBS. Preferably, the GAS polynucleotide is selected from one of the 62 GAS genes 

included in GAS Subset 3. In another embodiment, the polynucleotide is a GAS polynucleotide 
which is homologous with at least one gene from pneumococcus but is not homologous with any 
gene from GBS and is not homologous with any gene of any published bacterial genome at the 
time of the invention. Preferably, the GAS polynucleotide is selected from GAS Subset 3(a). 

10 In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 

homologous with at least one GAS polynucleotide, but is not homologous with any GBS gene. 
Preferably, the pneumoccous polynucleotide is selected from one of the 74 Spn genes included in 
Spn Subset 3. In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one gene from GAS, but is not homologous with any gene 

15 from GBS or with a gene from any other published bacterial genome at the time of the invention. 
Preferably, the pneumococcus polynucleotide is selected from Spn Subset 3(a). 

The invention further provides an immunogenic composition comprising a polypeptide, , 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to one or 
more Streptococcal species serotypes. Preferably, the polynucleotide is specific to a 

20 Streptococcal species serotype selected from the Streptococcal species GBS, GAS and 

pneumococcus. More preferably, the polynucleotide is specific to one or more GBS serotypes 
selected from the group consisting of GBS serotype la, lb, II, III, IV, V, VI, VII and VIII. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is conserved across 

25 one or more Streptococcal species serotypes. Preferably, the polynucleotide is specific to a 
Streptococcal species serotype selected from the Streptococcal species GBS, GAS and 
pneumococcus. More preferable, the polynucleotide is conserved across one or more GBS 
serotypes selected from the group consisting of GBS serotype la, lb, II, III, IV, V, VI, VII and 
VIII. 

30 The invention further provides an immunogenic composition comprising a polypeptide, 

or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to one or 
more clinical isolates of a Streptococcal species. Preferably, the polynucleotide is specific to a 
Streptococcal species clinical isolate selected from the Streptococcal species GBS, GAS and 
pneumococcus. More preferably, the polynucleotide is specific to one or more GBS clinical 
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isolates selected from the clinical isolates identified in Table 5. Still more preferably, the 
polynucleotide is specific to one or more GBS clinical isolates having one or more genes 
selected from the genes listed in Table 7. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
5 homologous to at least one gene from both GAS and pneumococcus and which varies among 
clinical isolates. In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from both GAS and pneumococcus and which is homologous 
with at least one gene from at least one of the clinical isolates identified in Table 5. In another 
embodiment, the polynucleotide is a GBS polynucleotide which is homologous to at least one 

10 gene from both GAS and pneumococcus and which is homologous with at least one gene from 
each of the clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from 
one of the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous to 
at least one gene from GAS and is not homologous to any gene from pneumococcus and which 

15 varies among clinical isolates. In another embodiment, the polynucleotide is a GBS 

polynucleotide which is homologous to at least one gene from GAS and is not homologous to 
any gene from pneumococcus and which is homologous to at least one gene from at least one of 
the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a GBS 
polynucleotide which is homologous to at least one gene from GAS and is not homologous to 

20 any gene from pneumococcus and which is homologous to at least one gene from each of the 
clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from one of the 
genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous to 
at least one gene from pneumococcus and is not homologous to any gene from GAS and which 

25 varies among clinical isolates. In another embodiment, the polynucleotide is a GBS 

polynucleotide which is homologous to at least one gene from pneumococcus and is not 
homologous to any gene from GAS and which is homologous to at least one gene from at least 
one of the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a 
GBS polynucleotide which is homologous to at least one gene from pneumococcus and is not 

30 homologous to any gene from GAS and which is homologous to at least one gene from each of 
the clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from one of 
the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is not 
homologous to any gene from GAS or pneumococcus and which varies among clinical isolates. 
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In another embodiment, the polynucleotide is a GBS polynucleotide which is not homologous to 
any gene from GAS or pneumococcus and which is homologous to at least one gene from at least 
one of the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a 
GBS polynucleotide which is not homologous to any gene from GAS or pneumococcus and 
5 which is homologous to at least one gene from each of the clinical isolates identified in Table 5. 
Preferably, the polynucleotide is selected from one of the genes listed in Table 7. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is conserved across 
one or more clinical isolates of a Streptococcal species. Preferably, the polynucleotide is 

10 conserved across one or more Streptococcal clinical isolates selected from the Streptococcal 

species GBS, GAS and pneumococcus. More preferable, the polynucleotide is conserved across 
one or more GBS clinical isolates identified in Table 5. Still more preferably, the polynucleotide 
is conserved across one or more clinical isolates having one or more genes selected from the 
genes listed in Table 7. 

15 The invention further provides for an immunogenic composition comprising a 

polypeptide, or a fragment thereof, encoded by a polynucleotide selected from one or more of the 
Subsets of the invention. Accordingly, the invention provides for an immunogenic composition 
comprising a polypeptide encoded by a polynucleotide selected from one or more of the 
following Subsets: GBS Subset 1, GBS Subset 2, GBS Subset 3, GBS Subset 4, GAS Subset 1, 

20 GAS Subset 2, GAS Subset 3, GAS Subset 4, Spn Subset 1, Spn Subset 2, Spn Subset 3, Spn 
Subset 4, GBS Subset 1(a), GBS Subset 2(a), GBS Subset 3(a), GBS Subset 4(a), GAS Subset 
1(a), GAS Subset 2(a), GAS Subset 3(a), GAS Subset 4(a), Spn Subset 1(a), Spn Subset 2(a), 
Spn Subset 3(a), Spn Subset 4(a), GBS Subset 1(b), GBS Subset 2(b), GBS Subset 3(b), GBS 
Subset 4(b), GBS Subset 5, GBS Subset 6, GBS Subset 6(a), GBS Subset 7, GBS Subset 8, GBS 

25 Subset 9, GBS Subset 10, GBS Subset 11, GBS Subset 12, GBS Subset 12(a), GBS Subset 

12(b), GBS Subset 12(c), GBS Subset 12(d), GBS Subset 12(e), GBS Subset 12(f), GBS Subset 
12(g), GBS Subset 12(h), GBS Subset 12(i), GBS Subset 12(j), GBS Subset 12(k), GBS Subset 
12(1), GBS Subset 12(m), GBS Subset 12(n), GBS Subset 12(o), GBS Subset 13(a), GBS Subset 
13(b), GBS Subset 13(c), GBS Subset 13(d), GBS Subset 13(e), GBS Subset 13(f), GBS Subset 

30 13(g), GBS Subset 13(h), GBS Subset 13(i), GBS Subset 13(j), GBS Subset 13(k), GBS Subset 
13(1), GBS Subset 13(m), GBS Subset 13(n), GBS Subset 13(o), GBS Subset 13(p), GBS Subset 
13(q), GBS Subset 14, GBS Subset 14(a), GBS Subset 14(b), GBS Subset 14(c), GBS Subset 
14(d), GBS Subset 14(e), GBS Subset 14(f), GBS Subset 14(g), and GBS Subset 14(h). 
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The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 1, GBS Subset 2, GBS Subset 3, and GBS Subset 4. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
5 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GAS Subset 1, GAS Subset 2, GAS Subset 3 5 and GAS Subset 4. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: Spn Subset 1, Spn Subset 2, Spn Subset 3, and Spn Subset 4. 
10 The invention provides for an immunogenic composition comprising a polypeptide, or a 

fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 1(a), GBS Subset 2(a), GBS Subset 3(a), and GBS Subset 4(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
15 Subsets: GAS Subset 1(a), GAS Subset 2(a), GAS Subset 3(a), and GAS Subset 4(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: Spn Subset 1(a), Spn Subset 2(a), Spn Subset 3(a), and Spn Subset 4(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
20 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 1(b), GBS Subset 2(b), GBS Subset 3(b), and GBS Subset 4(b). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from GBS Subset 5. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
25 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 6 and GBS Subset 6(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 7. 

30 The invention provides for an immunogenic composition comprising a polypeptide, or a 

fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 8. 
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The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 9. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
5 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 10. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 11. 

10 The invention provides for an immunogenic composition comprising a polypeptide, or a 

fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 12, GBS Subset 12(a), GBS Subset 12(b), GBS Subset 12(c), GBS Subset 
12(d), GBS Subset 12(e), GBS Subset 12(f), GBS Subset 12(g), GBS Subset 12(h), GBS Subset 
12(i), GBS Subset 12(j% GBS Subset 12(k), GBS Subset 12(1), GBS Subset 12(m), GBS Subset 

15 12(n), and GBS Subset 12(o). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 13(a), GBS Subset 13(b), GBS Subset 13(c), GBS Subset 13(d), GBS 
Subset 13(e), GBS Subset 13(f), GBS Subset 13(g), GBS Subset 13(h), GBS Subset 13(i), GBS 

20 Subset 13(j), GBS Subset 13(k), GBS Subset 13(1), GBS Subset 13(m), GBS Subset 13(n), GBS 
Subset 13(o), GBS Subset 13(p), GBS Subset 13(q). 

The invention provides for an immunogenic composition comprising a polypeptide or a 
fragment thereof encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 14, GBS Subset 14(a), GBS Subset 14(b), GBS Subset 14(c), GBS Subset 

25 14(d), GBS Subset 14(e), GBS Subset 14(f), GBS Subset 14(g), and GBS Subset 14(h). 

Each of the above-identified groups and subsets may be used to create immunogenic 
compositions comprising two or more Streptococcus polypeptides. The invention then provides 
for an immunogenic composition comprising a combination of Streptococcus polypeptides, said 
combination consisting of two, three, four, five, six, seven, eight, nine, or ten polypeptides 

30 selected from one of the groups identified above. Preferably, the combination consists of two, 
three, four or five polypeptides. Preferably, the polypeptides are all selected from the same 
group. Preferably, the polypeptides are selected from the same Subset described herein. The 
Streptococcus polypeptides are selected from GBS, GAS and pneumococcus. Preferably, all of 
the polypeptides in the combination are selected from the same species. 
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For example, the composition may comprise an combination of GBS polypeptides, said 
combination consisting of two, three, four, five, six, seven, eight, nine, or ten polypeptides, 
wherein each polypeptide is encoded by a GBS polynucleotide sequence which is homologous to 
a polynucleotide sequence of both GAS and pneumococcus. Preferably, the combination 
5 consists of two, three, four or five polypeptides. Preferably, the GBS polynucleotide sequences 
are selected from GBS Subset 1 . 

As another example, the composition may comprise a combination of GBS polypeptides, 
said combination consisting of two, three, four or five polypeptides, wherein each polypeptide is 
encoded by a GBS polynucleotide sequence which is homologous to a polynucleotide sequence 
10 of GAS. Preferably, the GBS polynucleotide sequences are selected from GBS Subset 2. 

The composition may comprise a combination of GBS polypeptides, said combination 
consisting of two, three, four or five polypeptides, wherein each polypeptide is encoded by a 
GBS polynucleotide sequence which is homologous to a polynucleotide sequence of 
Streptococcus pneumoniae. Preferably, the GBS polynucleotide sequences selected from GBS 
15 Subsets. 

The composition may comprise a combination of GBS polypeptides, said combination 
consisting of two, three, four or five polypeptides, wherein each polypeptide is encoded by a 
GBS serotype polynucleotide sequence which is homologous to at least one other GBS serotype. 
Preferably, the GBS polypeptides are encoded by GBS serotype polynucleotide sequences which 
20 are homologous to at least one other GBS serotype. 

The invention further provides for an immunogenic composition comprising a 
polypeptide or a fragment thereof comprising a fusion protein encoded by one or more of the 
polynucleotides included in the Subsets of the invention. 

The invention further provides a method for designing an immunogenic composition, 
25 such as a vaccine, by selecting one or more polypeptides encoded by a polynucleotide selected 
from one or more of the Subsets of the invention. Preferably, the immunogenic compositions of 
the invention comprise at least two, three, four or five polypeptides encoded by polynucleotides 
within the same Subset. 

The invention provides a method for raising an immune response in a patient by 
30 administering any one of the immunogenic compositions set forth above. The choice of 

immunogenic composition means that the immune response may be reactive against all three of 
GAS, GBS and streptococcus, may be reactive against only two of the three, or may be reactive 
only against GBS. 
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Each of the immunogenic compositions described above may be prepared and 
administered instead as a polynucleotide where the polypeptide is expressed in vivo. 

The immune response is preferably an antibody response. It may be a protective immune 
response. The patient is preferably a human. 
5 The immunogenic compositions of the invention may further comprise an adjuvant, as 

discussed in further detail below. 

Essential genes and knockouts 

The invention provides a Streptococcus bacterium wherein one or more genes within any 
10 of the Subsets of this invention have been knocked out. The choice of Subset means that the 
knocked out gene may be, for instance, a gene found in GBS but not in GAS or pneumococcus 
{e.g. which is involved in the pathogenesis of GBS, but not in the pathogenesis of GAS or 
pneumococcus, such as binding GBS cellular targets). 

Techniques for producing knockout bacteria are well known, and knockout Streptococci 
15 of various species have been reported [e.g. Margolis et ah (2001) Antimicrob. Agents Chemother. 
45:2432-2435; Zhang et ah (2000) Cell 102:827-837; Nizet et ah (2000) Infect. Immun. 68:4245- 
4254; Nizet et ah (1997) Adv. Exp. Med. Biol. 418:627-630; etc.}. 

The knockout mutation may be situated in the coding region of the gene or may lie within 
its transcriptional control regions (e.g. within its promoter). 
20 The knockout mutation will reduce the level of mRNA encoding the corresponding 

polypeptide to <1% of that produced by the wild-type bacterium, preferably <0.5%, more 
preferably <0.1%, and most preferably to 0%. 

The knockout mutants of the invention maybe used as immunogenic compositions (e.g. 
as vaccines) to prevent streptococcal infection. Such a vaccine may include the mutant as a live 
25 attenuated bacterium. 

The knockout mutants of the invention may be used to determine whether genes are 
essential for bacterial survival, either under normal or stress conditions. 

Antisense 

30 The invention provides a single-stranded nucleic acid comprising a fragment of xi or 

more nucleotides from a nucleotide sequence selected from one of the Subsets of the invention. 
The choice of group means that the nucleic acid may be complementary to a gene sequence 
found in GBS, GAS and pneumococcus, or a gene sequence specific to GBS. 
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The single-stranded nucleic acid is at least xi nucleotides long. The value of xj is at least 
7 {e.g. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45^ 46, 47, 48, 49, 50 etc.). The single-stranded 
nucleic acid may be at most x 2 nucleotides long, wherein x 2 is 100 or less (e.g. 99, 98, 97, 96, 95, 
5 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 
68, 67, 66, 65, 64, 63, 62, 61, 60). 

The nucleic acid is preferably of the formula 5 ! -(N) a -(X)-(N) 6 -3 ? , wherein 0>a>15, 
0>6>1 5, N is any nucleotide, and X is the fragment as defined above. The values of a and b may 
independently be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15. Each individual nucleotideN 
10 in the ~(N) a - and -(N)^- portions of the nucleic acid may be the same or different. The length of 
the nucleic acid (i.e. a+b+xi) is preferably x 2 or less. 

Antisense inhibition of streptococcal gene expression is known e.g. Sato et al. (1998) 
FEMS Microbiol Lett 159:241-245. Antibacterial antisense techniques are also disclosed in 
international patent applications WO99/02673 and W099/13893. 
15 The single-stranded nucleic acid may reduce the level of polypeptide expression from the 

complementary gene to <1% of that produced by the wild-type bacterium, preferably <0.5%, 
more preferably <0.1%, and most preferably to 0%. 

Antisense experiments may be used to determine whether genes are essential for bacterial 
survival, either under normal or stress conditions. 

20 

Screening methods 

The invention provides a method for screening compounds, wherein the method involves 
contacting the compounds with a polypeptide expressed by one or more of the polynucleotides 
selected from one of the Subsets of the invention. The method maybe for screening for agonists 

25 of the polypeptides, antagonists, antibiotics etc. The choice of group means, for instance, that 
the method may be used for identifying an antibiotic with broad anti-streptococcal activity could 
be identified, or for identifying an antibiotic specific to GBS. 

Potential compounds for screening include small organic molecules, peptides, peptoids, 
polypeptides, lipids, metals, nucleotides, nucleosides, aptamers, polyamines, antibodies, and 

30 derivatives thereof. Small organic molecules have a molecular weight between 50 and about 
2,500 daltons, and most preferably in the range 200-800 daltons. Complex mixtures of 
substances, such as extracts containing natural products, compound libraries or the products of 
mixed combinatorial syntheses also contain potential antagonists. 
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Typically, a polypeptide is incubated with a test compound, and the mixture is then tested 
to see if the polypeptide and test compound interact, or to see if the polypeptide's activity is 
inhibited. 

For preferred high-throughput screening methods, all the biochemical steps for this assay 
5 are performed in a single solution in, for instance, a test tube or microtitre plate, and the test 
compounds are analysed initially at a single compound concentration. For the purposes of high 
throughput screening, the experimental conditions are adjusted to achieve a proportion of test 
compounds identified as "positive" compounds from amongst the total compounds screened. 
The invention also provides a compound identified using these methods. These can be 
10 used to treat or prevent streptococcal infection. The compound preferably has an affinity for the 
adhesion-specific protein of at least 10" 7 M e.g. 10" 8 M, 10" 9 M, 1CT 10 M or tighter. 

Distinguishing Streptococcal species 

The invention provides a method for determining whether a Streptococcus bacterium of 
15 interest is or is not in the species agalactiae, pyogenes or pneumoiae^ comprising the step(s) of: 
(a) contacting the bacterium with a nucleic acid probe comprising the sequence of a gene 
selected from one of the Subsets of the invention; and/or (b) contacting the bacterium with an 
antibody which binds to a polypeptide encoded by one or more of the polynucleotides of one or 
more of the Subsets of the invention. The choice of group means, for instance, that the method 
20 may be used for distinguishing GBS from GAS and from pneumococcus, or for confirming that a 
bacterium is not a GAS or pneumococcus. 

The method will typically include the further step of detecting the presence or absence of 
an interaction between the bacterium of interest and the nucleic acid or protein. 

The bacterium of interest may be in a cell culture, for example, or may be within a 
25 biological sample believed or known to contain a streptococcus. It may be intact or may be, for 
instance, lysed. 

The term "biological sample" encompasses a variety of sample types obtained from an 
organism and can be used in a diagnostic or monitoring assay. The term encompasses blood and 
other liquid samples of biological origin, solid tissue samples, such as a biopsy specimen or 
30 tissue cultures or cells derived therefrom and the progeny thereof. The term encompasses 

samples that have been manipulated in any way after their procurement, such as by treatment 
with reagents, solubilization, or enrichment for certain components. The term encompasses a 
clinical sample, and also includes cells in cell culture, cell supernatants, cell lysates, serum, 
plasma, biological fluids, and tissue samples. 
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GBS 2603 Type V Genomic Sequence 

Applicants have sequenced the complete genome sequence of GBS clinical type V isolate 
2603 V/R and performed comparative analyses comparing this sequence with other GBS strains, 
5 with other species of pathogenic Streptococci and with other known bacterial species. The entire 
genomic sequence is available by August 26, 2002 at http://www.tigr.org . This genomic 
sequence is incorporated herein by reference in its entirety. The genomic sequence of GBS type 
V isolate 2603 V/R is also set forth in International Patent Application WO 02/34771. 

In one embodiment, the invention relates to the polynucleotides, and fragments and 
10 derivatives thereof, set forth in the GBS clinical type V isolate 2603 published genome which are 
not disclosed within WO 02/34771 . The invention further relates to polypeptides expressed by 
the polynucleotides of the invention. 

Applicants have predicted that the GBS 2603 isolate contains approximately 2,176 
predicted genes. Each predicted gene is set forth in Table 1, listed by a SAGxxxx ORF number. 
15 Table 1 also includes the predicted amino acid size of the predicted expressed protein and the 

predicted function, if known. The sequence of each SAG reference can be obtained at the TIGR 
website. 

Figure 1 is a circular representation of the GBS genome and comparative hybridisations 
using microarrays. A color version of Figure 1 can be found in Tettelin et aL, PNAS (2002) 

20 99(19): 12391-12396 and online at www.pnas.org . The outer circle represents predicted 

coding regions on the plus strand color coded by role categories: violet indicating amino acid 
biosynthesis; light blue indicating biosynthesis of cofactors, prosthetic groups, and carriers; light 
green indicating cell envelope; red indicating cellular processes; brown indicating central 
intermediary metabolism; yellow indicating DNA metabolism; light gray indicating energy 

25 metabolism; magenta indicating fatty acid and phospholipid metabolism; pink indicating protein 
synthesis and fate; orange indicating purines, pyrimidines, nucleosides, and nucleotides; olive 
indicating regulatory functions and signal transduction; dark green indicating transcription; teal 
indicating transport and binding proteins; gray indicating unknown function; salmon indicating 
other categories; blue indicating hypothetical proteins. 

30 The second circle represents predicted coding regions on the minus strand. In the third 

circle, black represents atypical nucleotide composition cui-ve; green represents most atypical 
regions; magenta represents insertion elements; red diamonds indicate rRNAs. 

Circles 4-22 represent comparative hybridisations of strain 2603 V/R with 19 GBS 
strains. Cy3/Cy5 (2603 V/R signal/test strain) ratio cutoffs were defined arbitrarily as Cy3/Cy5 
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- 1.0 - 3.0, the gene was present in the test strain, no color was added; Cy3/Cy5 = 3.0 - 10.0, 
ambiguous result (blue); Cy3/Cy5 > 10, gene absent in test strain (red). 

Circles 4-9 represent type la strains 090, 515, A909, Davis, and DK8. Circles 10-11 
represent type lb strains S7 7357b and H36B. Circles 12 - 13 represent type II strains 18RS21 
and DK21. Circles 14 - 18 represent type III COH1, COH31, D136C, M732 and M781. Circle 
19 represents type V strain CJB1 11. Circles 20-21 represent type VIII strains SMU014 and 
JM9130013. Circle 22 represents nontypable (NT) strain CJB1 10. Throughout Figure 1, 
varying regions of five or more consecutive genes are indicated by yellow bullets. 

Figure 4 depicts a linear representation of the GBS genome. The location of predicted 
coding regions color-coded by biological role (see Figure 1) is displayed. Arrowed boxes 
represent the direction of transcription for each ORF. The number of membrane-spanning 
domains predicted by TopPred is displayed as lipid bi-layers on top of ORFs, only for those 
whose products have five or more predicted membrane spanning regions. Genes coding for 
rRNAs (16S, 23S, 5S) and tRNAs (clover leaf structure with number of genes) are indicated. 
Predicted Rho-independent transcriptional terminators are represented by hairpins. 

ORF's were predicted by GLIMMER (See, Delcher, et al., (1999) Nucleic Acids Res. 27, 
4636 - 4641 and Salzberg, et al., (1998) Nucleic Acids Res. 26, 544-548) trained with ORFs 
larger than 600 base pairs from the genomic sequence and GBS genes available in GenBank. All 
predicted proteins larger than 30 amino acids were searched against a nonredundant protein 
database. (See Fleischmann, et al., (1995) Science 269, 496 - 512). Frame-shifts and point 
mutations were detected and corrected where appropriate; those remaining were annotated as 
"authentic frame-shift" or "authentic point mutation". Protein membrane-spanning domains 
were identified by TOPPRED (See Claros, et al., (1994) Comput. Appl BioscL 10, 685 - 686). 
Candidate lipoprotein signal peptides (See Hayashi et al., (1990) J. Bioenerg. Biomembr. 22, 451 
- 471) were flagged by N-terminal exact matches to the pattern {DERK} (6)-[LIVMFWSTAG] 
(2)-[LIVMFYSTAGCQ] - [AGS] - C. Putative signal peptides were identified by using 
SIGNALP (Nielsen, et al., (1997) Protein Eng. 10, 1 - 6). Two sets of hidden Markov models 
were used to determine ORF membership in families and superfamilies: PFAM Ver. 5.5 
(Bateman, et al., (2000) Nucleic Acids Res. 28, 263 - 266) and TIGRFAMS 1 .0 (Haft et al., 
(2001) Nucleic Acids Res. 29, 41 - 43). Domain-based paralogous families were built by 
performing all-versus-all searches on the protein sequences by using a modified version of a 
previously described method. (Niermann, et al., (2001) Proc. Natl Acad. Set USA 98, 4136 - 
4141) Potential lineage-specific gene duplications were estimated by identification of OFRs 
more similar to ORFs within the GBS genome than to ORFs from other complete genomes. All 
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ORFs were searched with FASTA3 (Pearson (2000) Methods Mol Biol 132, 185-219) against 
all ORF's from the complete genomes and matches with a FASTA P value of 10" 15 were 
considered significant. 

The genome consists of a circular chromosome of 2, 160,266 base pairs with a G+C 
5 content of 35.7%. Base pair one of the chromosome was assigned within the putative origin of 
replication. The genome contains 80 tRNAs, 7rRNAs, and 3 sRNAs. Approximately 78% of the 
2,176 predicted genes are transcribed in the same direction as that of DNA replication, a feature 
also observed in S. pn. and other low-GC Gram positive organisms. 

Biological roles were assigned to 1,409 (65%) of the genome according to a classification 
10 scheme adapted from Riley (1993) Microbiol Rev. 57, 862 - 952. Another 527 predicted 
proteins (24%) matched proteins of unknown function, and the remaining 240 (11%) had no 
database match. The expression of 50 of these hypothetical proteins was confirmed by Western 
Blot analysis, and the proteins were annotated as "proteins of unknown function." A total of 339 
paralogous protein families were identified in strain 2603, containing 941 predicted proteins 
15 (43% of the total). 

The Western Blot analysis was conducted as follows. GBS strain 2603 V/R cells were 
grown in Todd-Hewitt broth (Difco) to OD600nm = 0.5. The culture was centrifuged for 20 
minutes at 5,000 rpm. The supernatant was discarded, and bacteria were washed once with PBS, 
resuspended in 2 ml of 50 mM Tris-HCl pH 6.8, containing 400 units of Mutanolysin (Sigma), 
20 and incubated 2 hours at 37°C. After three cycles of freeze and thaw, cellular debris was 

removed by centrifugation at 14,000 rpm for 10 minutes, and the protein concentration of the 
supernatant was measured by the Bio-Rad Protein assay, with BSA as a standard. Purified 
recombinant proteins (50 ng) and total cell extracts (25 [ig) derived from GBS serotype V 2603 
V/R strain were separated by SDS/PADE and electroblotted onto nitrocellulose membranes for 1 
25 hour at 100 V. The membranes were saturated by overnight incubation at 4° C in 5% skimmed 
milk and 0.1% Tween 20 in PBS and incubated for 1 hour at room temperature with sera from 
immunized mice diluted 1 :500 - 1 : 1,000 in saturation buffer. To reduce background due to 
antibodies raised against contaminating E. coli proteins, sera were preincubated with E. coli 
protein extracts absorbed on nitrocellulose strips. The membranes were washed twice in 3% 
30 skimmed milk and 0.1% Tween 20 in PBS and incubated for 1 hour with a 1 : 1,000 dilution of 
horseradish peroxidase-conjugated antimouse Ig (DAKO). After washing with 0.1 % Tween 20 
in PBS, the membranes were developed with the Opti-4CN Substrate Kit (Bio-Rad). 
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Table 2 comprises a list of predicted and experimentally characterized surface and 
secreted proteins from GBS. Candidate signal peptides and lipoprotein motifs were predicted 
with PSORT [Nakai, K. & Horton, P. (1999) Trends Biochem Sci 24, 34-6] and other methods 
(see methods), sortase motifs (LPxTG) were detected using the FINDPATTERNS program of 
5 the GCG Package [Devereux, J., Haeberli, P. & Smithies, O. (1984) Nucleic Acids Res 12, 387- 
95] and hidden Markov models. Column "Other" indicates proteins carrying other motifs (e.g. 
integrin-binding motif RGD) or are similar to characterized surface-exposed proteins. Western 
blot results were considered positive when the antibodies revealed a predominant band of the 
expected molecular weight on the total protein extracts of S. agalactiae strain 2603 V/R, ORFs 

10 without + or - in this column were not tested in western blot. FACS analyses were performed 
for western blot positive proteins only. Western blot and FACS data are displayed only for 
proteins carrying at least one of the other motifs shown in the table. Column "GBS specific" 
indicates genes unique to S. agalactiae (when compared to other completely sequenced 
genomes) that are present in all the S. agalactiae strains tested in comparative genome 

15 hybridization analyses. Finally, only proteins carrying less than 3 predicted transmembrane 
domains are shown in the table, other proteins are likely to be embedded in the cytoplasmic 
membrane and are probably not exposed on the organism's surface. 

FACS data was collected as follows: GBS 2603 V/R strain cells were grown in Todd- 
Hewitt broth (Difco) to OD600nm = 0.5. The culture was centrifuged for 20 minutes at 5,000 

20 rpm, and bacteria were washed once with PBS, resuspended in PBS containing 0.05% 

paraformaldehyde, and incubated for 1 hour at 37°C and then overnight at 4°C. Fifty microliters 
of fixed bacteria (OD600nm 0.1) was washed once with PBS, resuspended in 20 of newborn 
calf serum (Sigma), and incubated for 1 hour at 4°C in IOOjjI of preimmune or immune sera and 
diluted 1:200 in dilution buffer (PBS, 20% newborn calf serum, 0.1 % BSA). After 

25 centrifugation and washing with 200jnl of washing buffer (0.1 % BSA in PBS), samples were 
incubated for 1 hour at 4°C with 50 [xl of R-phycoerythrin-conjugated F(ab)2 goat anti-mouse 
IgG (Jackson ImmunoResearch) diluted 1 :100 in dilution buffer. Cells were washed with 200 \xl 
of washing buffer and resuspended in 200 jllI of PBS. Samples were analysed by using a FACS 
calibur apparatus (Becton Dickinson), and data were analyzed by using CELL QUEST (Becton 

30 Dickinson). A shift in mean fluorescence intensity of >75 channels compared with preimmune 
sera from the same mice was considered positive. This cutoff was determined from the mean 
plus two standard deviations of shifts obtained with control sera raised against mock purified 
recombinant proteins from cultures of E. coli carrying the empty expression vector and included 
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in every experiment. Artifacts due to bacterial lysis were excluded by using antisera raised 
against six different known cytoplasmic proteins, all of which gave negative results. 

Regions of Atypical Nucleotide Composition. 

These regions were identified by the x 2 analysis: the distribution of all 64 trinucleotides 
(3 mers) was computed for the complete genome in all six reading frames, followed by the 3-mer 
distribution in 2,000-bp windows. Windows overlapped by 1,000 bp. For each window, the x 2 
statistic on the difference between its 3-mer content, and that of the whole genome was 
computed. 

In Silico Genome Comparisons 

The protein sets of S. agalactiae, Streptococcus pneumoniae and S. pyogenes were 
compared by using FASTA3. A general description of the FASTA3 sequence comparison 
program is discussed in Pearson, W.R., "Flexible Sequence Similarity Searching with the 
FASTA3 Program Package", (2000) Methods Mol Biol, 132: 185-219. Shared genes were 
defined using a FASTA3 P value cutoff of 10' 15 . These shared genes and genes that S. agalactiae 
did not share with the other streptococci using this cutoff were subsequently searched against all 
completely sequenced genomes, and genes were defined as unique to streptococci or S. 
agalactiae when they did not share similarity with any other gene sets with a FASTA3 P value of 
10" 5 or lower. The use of two cutoffs provides for a more stringent analysis of shared or unique 
genes. 

Figure 2 is a schematic representation of in silico comparisons between streptococci. The 
protein sets of GBS, S. pn., and GAS were compared by using FASTA3. Numbers under the 
species name indicate genes that are not shared with the other species; values in parenthesis are 
the number of proteins in each species (excluding frame-shifted and degenerated genes). 
Numbers in the intersections indicate genes shared by two or three species. These are displayed 
in the color corresponding to the species used as the query. (GBS: green; S.pn.: blue; GAS: 
red. A color version of Figure 2 can be found in Tettelin et al., PNAS (2002) 99(19): 12391 - 
12396 and online at www.pnas.org .). Numbers in any given intersection are slightly different 
due to gene duplications in some species. 

Table 3 lists genes which were shared among GBS, GAS and pneumococcus, but which 
were not found in any of the other completely sequenced genomes. The protein sets of 
S. agalactiae, S. pneumoniae, and S. pyogenes were compared using FASTA3 [Pearson, W. R. 
(2000) Methods Mol Biol 132, 185-219]. Shared genes were defined using a FASTA3 p value 
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cutoff of 10" 15 . These shared genes and genes that S. agalactiae did not share with the other 
streptococci using this cutoff were subsequently searched against all completely sequenced 
genomes and genes were defined as unique to streptococci or S. agalactiae when they did not 
share similarity with any other gene sets with a FASTA3 p value of 10~ 5 or lower. 

5 

Svnteny 

Regions of conservation of gene synteny were computed as windows of 10 kb spanning 
at least three genes whose order was conserved in the other species. Regions were merged if 
they were less than 20 kb apart. The number of genes within each broad region was then 
10 calculated. 



Comparative Genome Hybridizations 

Comparative genome hybridizations (See Figure 1) using DNA microarrays were 
performed between the sequenced type V strain 2603 V/R and 19 other GBS strains of multiple 
15 serotypes (See Table %). Predicted genes from strain 2603 V/R were amplified by PGR and 
arrayed on glass microscope slides. See Peterson, et al., (2000) J. Bacteriol. 182, 6192-6202. 
Genomic DNA was labelled according to protocols provided by J. DeRisi 

r www.microarravs.org/Pdfs/Genomic--DNALabel B.pdf) , except that the DNA was not digested 
or sheared before labelling. Arrays were scanned with a GENEPIX 4000B scanner (Axon 

20 Instruments, Foster City, CA), and individual hybridisation signals were quantitated with TIGR 
SPOTFINDER. See Hedge, et al., (2000), Biotechniques 29, 548-550, 552-554, 556. Cy3/Cy5 
(2603 V/R signal/test strain) ratio cutoffs were defined arbitrarily as Cy3/Cy5 = 1.0-3.0, gene 
present in test strain; 3.0 - 10.0, ambiguous result; >10.0, gene absent. For ambiguous results, 
the gene may be divergent in the test strain relative to 2603 V/R, or the gene may be absent in 

25 the test strain but still produces paralogous gene family or a repetitive elemtn. Although cutoffs 
are arbitrary, they fit nicely the results for the variation of the capsule locus in the strains tested 
(see region 9 on Figure 1) where most genes are slightly divergent and only a few are completely 
different. 

The CGH detected 1,698 genes in all of the strains, whereas 401 genes from strain 2603 
30 V/R (18% of the gene complement) were not detected in at least one other strain, suggesting that 
they are absent or significantly divergent in those strains. Two hundred sixty (38%) of the 683 
genes specific to S. agalactiae when compared with the other two streptococci (Fig. 2), including 
virulence determinants and surface proteins, vary among S. agalactiae strains, whereas only 47 
(4%) of the genes common to all three streptococcal species, including 5 of the 6 sortases 
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identified in the genome, vary among strains. Thus, the in silico analysis of genes shared by the 
streptococci that are not expected to vary among this genus is consistent with the CGH analysis. 
Forty-four (25%) of the genes shared by S. agalactiae and S. pneumoniae and 44 (20%) of those 
shared by S. agalactiae and S. pyogenes vary in the CGH analysis. The first set contains many 

5 glycosyl transferases and proteins carrying a cell-wall anchor, whereas the second set displays 
many phage-related genes. One hundred thirty-six of the 3 1 5 genes unique to S. agalactiae 
when compared with all sequenced genomes vary among strains. These include R5, three 
capsular genes, two cell wall-anchored proteins, and three transcriptional regulators. Three 
hundred sixty-four (91%) of the 401 varying genes correspond to 15 regions containing more 

10 than 5 contiguous genes. Ten of these regions display an atypical nucleotide composition in 
strain 2603 V/R (Fig. 1), consistent with the possibility that they were horizontally transferred 
into this strain. Two of the largest regions (region 4, a prophage and region 7, similar to Tn916 
from Enterococcus faecalis) are flanked by insertion sequence elements. The 15 regions contain 
many proteins predicted to be anchored on the cell wall or surface exposed, including Rib 

15 (region 3), sortases, glycosyl transferases, the capsule locus (region 9, divergent in all strains but 
the other type V strain CJB1 1 1), and phage-related genes. Region 14 is unique to S. agalactiae 
and spans 33 genes (SAG1989- SAG2021), including 25 proteins of unknown function, some of 
which carry a cell-wall anchor. It is flanked by an ISL3 transposase and displays an atypical 
nucleotide composition. Region 1, unique to S. agalactiae, is a possible plasmid or remnant of a 

20 phage (SAG021 8-SAG023 8), contains mostly hypothetical proteins, and is flanked by a site- 
specific recombinase. Region 8 is specific to S. agalactiae, comprises 20 proteins of unknown 
function (SAG1018- SAG1037), most of which are predicted to be membrane associated or 
secreted, and displays an atypical nucleotide composition. 

The CGHresults were analyzed by profile clustering where genes are grouped based on 

25 their distribution patterns (Fig. 5). Sixteen clusters of five or more contiguous and 

noncontiguous genes comprising a total of 300 genes were identified (Table 6). Several clusters 
correspond to regions of contiguous genes described above. Some clusters of genes that do not 
share sequence similarity and are located at different loci in the genome display an identical 
profile. For instance, a cluster of genes containing a surface antigen (SAG0674-SAG0681) 

30 follows the same distribution as another cluster containing only hypothetical proteins (SAG0247- 
SAG0249). A putative pathogenicity protein (SAG2063) also clusters with a region containing 
several glycosyl transferases and Sec proteins (SAG1447-SAG1462). 
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Profile clustering was also used to group strains based on similarity of gene content (Fig. 
5). In addition, the sequences of 19 genes from each of 1 1 S. agalactiae strains were determined 
after PGR amplification and used for phylogenetic analyses. The strains were the following: type 
la, 090 and A909; type lb, H36B; type II, 18RS21; type III, COH1, M732 and M781; type V, 
5 2603 V/R and 1 169NT1 ; type VIII, JM9130013; and nontypeable strain CJB1 10. The set 
comprised 8 housekeeping genes and 1 1 genes coding for proteins predicted to be surface- 
exposed (Table 7). 

The profile clustering was conducted as follows. The information and absence of genes 

based on the comparative genome hybridisation results was used to group genes based on their 
10 distribution patterns. The analysis used was essentially identical to that used for phylogenetic 

profile analysis. See Pellegrinie, et al., (1999) Proa Natl Acad, Set USA 96, 4285 - 4288. 

Each gene was assigned a binary profile based on its presence or absence across the different 

strains, with presence determined by a Cy3/Cy5 ratio < 3.0 and absence > 3.0. The gene profiles 

were then clustered by using the single-linkage clustering algorithm with column weighting (all 
15 with default settings) of CLUSTER ( http://rana.lbl.gov) . The CLUSTER program also groups 

the strains (columns) based on similarity of gene profiles. Clusters of genes and strains were 

viewed by using TREEVIEW ( http://rana.lbl.gov) . 

Phylogenetic trees were inferred for the complete set of 19 genes and for the subsets of 

housekeeping and surface-exposed genes. Because the branching patterns in all three trees were 
20 identical, only the tree of the 1 9 genes is shown in Fig. 3. The degree of polymorphism of the 

housekeeping and the surface-exposed genes is similar (-1 variable site among all of the strains 

per 100 bp). 

The sequences of genes from the different strains were aligned by using CLUSTALW 
(See Thompson (1994), Nucleic Acids Res. 22, 4673 - 4680.) and trimmed to remove 

25 ambiguously aligned regions. Phylognetic trees of individual genes and of concatenated 

alignments of multiple genes were inferred by using maximum likelihood methods of PAUP* 
4.0 blO (Sinauer, Sunderland, MA). Bootstrap analysis was carried out using PAUP* as well. 
The possibility of recombination among strains was examined by using analysis of sequence 
variation using SIMP LOT (S.C. Ray) and analysis of phylogenetic heterogeneity by using 

30 MACCLADE (Sinauer). 

Analysis of this variation showed no evidence for major recombination events between 
the strains. There were no long stretches of polymorphic sites that strongly supported other trees 
(analysis with MACCLADE), and there were no significant crossover events in plots of sequence 
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similarity between strains (analysis with SIMPLOT). Some strain groupings (clades) generated 
by phylogenetic analysis were similar to clusters from the profile analysis (type III strains M781, 
M732 and COH1; type la strain 090 and nontypable strain CJB110), whereas others were 
different, possibly because of the aforementioned problems with the profile clustering. In both 
the phylogenetic analysis and the profile clustering, there is serotypedependent and -independent 
clustering (Figs. 3 and 5). The presence of strains of the same serotype in different clades or 
clusters could be due to lateral gene transfer. 

Figure 5 demonstrates phylogenetic profiling of GBS strains based on comparative 
genome hybridisations. The information on presence and absence of genes based on the 
microarray comparative genome hybridization results was used for phylogenetic profile analysis. 
The presence of a particular gene or gene cluster is indicated in the figure by a red square and the 
absence of a gene or cluster by a black square. The relationship between strains based on this 
analysis is depicted by the tree at the top of the figure. The strains and their serotypes are 
indicated (NT: nontypeable). Clusters with identical profiles are reduced to a single horizontal 
line and the number of genes in each cluster is indicated on the right. The clusters of 5 or more 
genes, labeled in red text and numbered, are listed in Table 6. The 1698 genes shared by all 19 
strains are labeled in green text. 

Figure 3 depicts a phylogenetic tree of GBS strains based on PGR sequences. The 
sequences of 19 genes (Table 7) from each of 1 1 GBS strains were aligned and trimmed to 
remove ambiguously aligned regions, and phylogenetic trees were inferred. Strain names are 
indicated in bold, and serotypes are indicated under the strain names. Bootstrap values are 
indicated on the branches. 

Techniques 

A summary of standard techniques and procedures which may be employed in order to 
perform the invention (e.g. to utilise the disclosed sequences for vaccination or diagnostic 
purposes) follows. This summary is not a limitation on the invention, but gives examples that 
may be used, but are not required. 

General 

The practice of the present invention will employ, unless otherwise indicated, conventional techniques of 
molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. 
Such techniques are explained fully in the literature eg. Sambrook Molecular Cloning; A Laboratory 
Manual Second Edition (1989) or Third Edition (2000); DNA Cloning Volumes I and II (D.N Glover ed. 



44 



WO 2004/018646 



PCT/US2003/026827 



1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); Nucleic Acid Hybridization (B.D. Hames & SJ. 
Higgins eds. 1984); Transcription and Translation (B.D. Hames & SJ. Higgins eds. 1984); Animal Cell 
Culture (R.I. Freshney ed. 1986); Immobilized Cells and Enzymes (IRL Press, 1986); B. Perbal, A Practical 
Guide to Molecular Cloning (1984); the Methods in Enzymology series (Academic Press, Inc.), especially 
5 volumes 154 & 155; Gene Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, 
Cold Spring Harbor Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and 
Molecular Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and 
Practice, Second Edition (Springer- Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 
I-IV(DM. Weir and C. C. Blackwell eds 1986). 
1 0 Standard abbreviations for nucleotides and amino acids are used in this specification. 
Further Definitions 

A composition containing X is "substantially free of Y when at least 85% by weight of the total X+Y in 
the composition is X. Preferably, X comprises at least about 90% by weight of the total of X+Y in the 
composition, more preferably at least about 95% or even 99% by weight. 
15 The term "comprising" means "including" as well as "consisting" e.g. a composition "comprising" X may 
consist exclusively of X or may include something additional e.g. X + Y. 

The singular forms "a", "and", and "the" include plural referents unless the context clearly dictates 
otherwise. Thus, for example, reference to "a polynucleotide" includes a plurality of such polynucleotides 
and reference to "an epithelial cell" includes reference to one or more cells and equivalents thereof known 

20 to those skilled in the art, etc. 

The term "heterologous" refers to two biological components that are not found together in nature. The 
components may be host cells, genes, or regulatory regions, such as promoters. Although the heterologous 
components are not found together in nature, they can function together, as when a promoter heterologous 
to a gene is operably linked to the gene. Another example is where a Streptococcal sequence is heterologous 

25 to a mouse host cell. A further examples would be two epitopes from the same or different proteins which 
have been assembled in a single protein in an arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous unit of 
polynucleotide replication within a cell, capable of replication under its own control. An origin of 
30 replication may be needed for a vector to replicate in a particular host cell. With certain origins of 
replication, an expression vector can be reproduced at a high copy number in the presence of the appropriate 
proteins within the cell. Examples of origins are the autonomously replicating sequences, which are 
effective in yeast; and the viral T-antigen, effective in COS-7 cells. 
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A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having sequence 
identity with the native or disclosed sequence. Depending on the particular sequence, the degree of 
sequence identity between the native or disclosed sequence and the mutant sequence is preferably greater 
than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the Smith- Waterman algorithm 

5 as described above). As used herein, an "allelic variant" of a nucleic acid molecule, or region, for which 
nucleic acid sequence is provided herein is a nucleic acid molecule, or region, that occurs essentially at the 
same locus in the genome of another or second isolate, and that, due to natural variation caused by, for 
example, mutation or recombination, has a similar but not identical nucleic acid sequence. A coding region 
allelic variant typically encodes a protein having similar activity to that of the protein encoded by the gene 

10 to which it is being compared. An allelic variant can also comprise an alteration in the 5' or 3 5 untranslated 
regions of the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 
Expression systems 

The Streptococcal nucleotide sequences can be expressed in a variety of different expression systems; for 
example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 

15 i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA sequence capable 
of binding -mammalian RNA polymerase and initiating the downstream (3 ! ) transcription of a coding 
sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiating region, which is 
usually placed proximal to the 5' end of the coding sequence, and a TATA box, usually located 25-30 base 

20 pairs (bp) upstream of the transcription initiation site. The TATA box is thought to direct RNA polymerase 
II to begin RNA synthesis at the correct site. A mammalian promoter will also contain an upstream 
promoter element, usually located within 100 to 200 bp upstream of the TATA box. An upstream promoter 
element determines the rate at which transcription is initiated and can act in either orientation [Sambrook et 
al (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A Laboratojy 

25 Manual, 2nd ed.J. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include the 
SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late promoter (Ad 
MLP), and herpes simplex virus promoter. In addition, sequences derived from non-viral genes, such as the 
30 murine metallotheionein gene, also provide useful promoter sequences. Expression may be either 
constitutive or regulated (inducible), depending on the promoter can be induced with glucocorticoid in 
hormone-responsive cells. 

The presence of an enhancer element (enhancer), combined with the promoter elements described above, 
will usually increase expression levels. An enhancer is a regulatory DNA sequence that can stimulate 
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transcription up to 1000-fold when linked to homologous or heterologous promoters, with synthesis 
beginning at the normal RNA start site. Enhancers are also active when they are placed upstream or 
downstream from the transcription initiation site, in either normal or flipped orientation, or at a distance of 
more than 1000 nucleotides from the promoter [Maniatis et al. (1987) Science 236:1237; Alberts et al. 

5 (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements derived from viruses may be particularly 
useful, because they usually have a broader host range. Examples include the SV40 early gene enhancer 
[Dijkema et al (1985) EMBO J. 4:761] and the enhancer/promoters derived from the long terminal repeat 
(LTR) of the Rous Sarcoma Virus [Gorman et al. (1982b) Proc. Natl Acad. Set 79:6111] and from human 
cytomegalovirus [Boshart et al (1985) Cell 42:521], Additionally, some enhancers are regulatable and 

10 become active only in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and 
Borelli (1986) Trends Genet 2:215; Maniatis et al (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the recom- 
binant protein will always be a methionine, which is encoded by the ATG start codon. If desired, the N- 

1 5 terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric 
DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provides for 
secretion of the foreign protein in mammalian cells. Preferably, there are processing sites encoded between 
the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence 

20 fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion 
of the protein from the cell. The adenovirus triparite leader is an example of a leader sequence that provides 
for secretion of a foreign protein in mammalian cells. 

Usually, transcription termination and polyadenylation sequences recognized by mammalian cells are 
regulatory regions located 3' to the translation stop codon and thus, together with the promoter elements, 

25 flank the coding sequence. The 3' terminus of the mature mRNA is formed by site-specific post- 
transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 42:349; Proudfoot and Whitelaw 
(1988) "Termination and 3' end processing of eukaryotic RNA. In Transcription and splicing (ed. B.D. 
Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. Set 74:105]. These sequences direct the 
transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. Examples of 

30 transcription terminater/polyadenylation signals include those derived from SV40 [Sambrook et al (1989) 
"Expression of cloned genes in cultured mammalian cells." In Molecular Cloning: A Laboratory Manual]. 
Usually, the above described components, comprising a promoter, polyadenylation signal, and transcription 
termination sequence are put together into expression constructs. Enhancers, introns with functional splice 
donor and acceptor sites, and leader sequences may also be included in an expression construct, if desired. 
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Expression constructs are often maintained in a replicon, such as an extrachromosomal element (eg. 
plasmids) capable of stable maintenance in a host, such as mammalian cells or bacteria. Mammalian 
replication systems include those derived from animal viruses, which require trans-acting factors to 
replicate. For example, plasmids containing the replication systems of papovaviruses, such as SV40 
[Gluzman (1981) Cell 23:175] or polyomavirus, replicate to extremely high copy number in the presence of 
the appropriate viral T antigen. Additional examples of mammalian replicons include those derived from 
bovine papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a prokaryotic host 
for cloning and amplification. Examples of such mammalian-bacteria shuttle vectors include pMT2 
[Kaufman et al. (1989) Mol Cell Biol 9:946] and pHEBO [Shimizu et al. (1986) Mol Cell Biol 5:1074]. 
The transformation procedure used depends upon the host to be transformed. Methods for introduction of 
heterologous polynucleotides into mammalian cells are known in the art and include dextran-mediated 
transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast fusion, 
electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA 
into nuclei. 

Mammalian cell lines available as hosts for expression are known in the art and include many immortalized 
cell lines available from the American Type Culture Collection (ATCC), including but not limited to, 
Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney cells 
(COS), human hepatocellular carcinoma cells (eg. Hep G2), and a number of other cell lines. 
ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, and is 
operably linked to the control elements within that vector. Vector construction employs techniques which 
are known in the art. Generally, the components of the expression system include a transfer vector, usually a 
bacterial plasmid, which contains both a fragment of the baculovirus genome, and a convenient restriction 
site for insertion of the heterologous gene or genes to be expressed; a wild type baculovirus with a sequence 
homologous to the baculovirus-specific fragment in the transfer vector (this allows for the homologous 
recombination of the heterologous gene in to the baculovirus genome); and appropriate insect host cells and 
growth media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the wild type 
viral genome are transfected into an insect host cell where the vector and viral genome are allowed to 
recombine. The packaged recombinant virus is expressed and recombinant plaques are identified and 
purified. Materials and methods for baculovirus/insect cell expression systems are commercially available 
in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). These techniques are generally 
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known to those skilled in the art and fully described in Summers & Smith, Texas Agricultural Experiment 
Station Bulletin No. 1555 (1987) ("Summers & Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above described 
components, comprising a promoter, leader (if desired), coding sequence, and transcription termination 
5 sequence, are usually assembled into an intermediate transplacement construct (transfer vector). This may 
contain a single gene and operably linked regulatory elements; multiple genes, each with its owned set of 
operably linked regulatory elements; or multiple genes, regulated by the same set of regulatory elements. 
Intermediate transplacement constructs are often maintained in a replicon, such as an extra-chromosomal 
element {e.g. plasmids) capable of stable maintenance in a host, such as a bacterium. The replicon will have 

10 a replication system, thus allowing it to be maintained in a suitable host for cloning and amplification. 

Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is pAc373. 
Many other vectors, known to those of skill in the art, have also been designed. These include, for example, 
pVL985 (which alters the polyhedrin start codon from ATG to ATT, and which introduces a BamHI 
cloning site 32 basepairs downstream from the ATT; see Luckow and Summers, Virology (1989) 77:31. 

15 The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. Rev. 
Microbiol, 42:111) and a prokaryotic ampicillin-resistance {amp) gene and origin of replication for< 
selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any DNA 
sequence capable of binding a baculovirus RNA polymerase and initiating the downstream (5' to 3') 

20 transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have a transcription 
initiation region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. A 
baculovirus transfer vector may also have a second domain called an enhancer, which, if present, is usually 
distal to the structural gene. Expression may be either regulated or constitutive. 

25 Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly useful 
promoter sequences. Examples include sequences derived from the gene encoding the viral polyhedron 
protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: The Molecular Biology 
of Baculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 476; and the gene encoding the 
plO protein, Vlak et al, (1988), J. Gen. Virol. 69:165. 

30 DNA encoding suitable signal sequences can be derived from genes for secreted insect or baculovirus 
proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 73:409). Alternatively, 
since the signals for mammalian cell posttranslational modifications (such as signal peptide cleavage, 
proteolytic cleavage, and phosphorylation) appear to be recognized by insect cells, and the signals required 
for secretion and nuclear accumulation also appear to be conserved between the invertebrate cells and 
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vertebrate cells, leaders of non-insect origin, such as those derived from genes encoding human ct- 
interferon, Maeda et al., (1985), Nature 315:592; human gastrin-releasing peptide, Lebacq-Verheyden et al, 
(1988), Molec. Cell Biol S:3129; human IL-2, Smith et al., (1985) Proc. Nat'l Acad. Set USA, §2:8404; 
mouse IL-3, (Miyajima et al., (1987) Gene 55:273; and human glucocerebrosidase, Martin et al (1988) 

5 DNA, 7:99, can also be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed with the 
proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused foreign proteins 
usually requires heterologous genes that ideally have a short leader sequence containing suitable translation 
initiation signals preceding an ATG start signal. If desired, methionine at the N-terminus may be cleaved 

10 from the mature protein by in vitro incubation with cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted from 
the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised of a leader 
sequence fragment that provides for secretion of the foreign protein in insects. The leader sequence 
fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the 

1 5 translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor of the 
protein, an insect cell host is co-transformed with the heterologous DNA of the transfer vector and the 
genomic DNA of wild type baculovirus - usually by co-transfection. The promoter and transcription 
termination sequence of the construct will usually comprise a 2-5kb section of the baculovirus genome. 

20 Methods for introducing heterologous DNA into the desired site in the baculovirus virus are known in the 
art. (See Summers & Smith supra; Ju et al. (1987); Smith et al, Mol Cell Biol (1983) 3:2156; and Luckow 
and Summers (1989)). For example, the insertion can be into a gene such as the polyhedrin gene, by 
homologous double crossover recombination; insertion can also be into a restriction enzyme site engineered 
into the desired baculovirus gene. Miller et al., (1989), Bioessays 4:91. The DNA sequence, when cloned in 

25 place of the polyhedrin gene in the expression vector, is flanked both 5 f and 3* by polyhedrin-specific 
sequences and is positioned downstream of the polyhedrin promoter, 

The newly formed baculovirus expression vector is subsequently packaged into an infectious recombinant 
baculovirus. Homologous recombination occurs at low frequency (between about 1% and about 5%); thus, 
the majority of the virus produced after cotransfection is still wild-type virus. Therefore, a method is 
30 necessary to identify recombinant viruses. An advantage of the expression system is a visual screen 
allowing recombinant viruses to be distinguished. The polyhedrin protein, which is produced by the native 
virus, is produced at very high levels in the nuclei of infected cells at late times after viral infection. 
Accumulated polyhedrin protein forms occlusion bodies that also contain embedded particles. These 
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occlusion bodies, up to 15 jam in size, are highly retractile, giving them a bright shiny appearance that is 
readily visualized under the light microscope. Cells infected with recombinant viruses lack occlusion 
bodies. To distinguish recombinant virus from wild-type virus, the transfection supernatant is plaqued onto 
a monolayer of insect cells by techniques known to those skilled in the art. Namely, the plaques are 
5 screened under the light microscope for the presence (indicative of wild-type virus) or absence (indicative 
of recombinant virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) 
at 16.8 (Supp. 10, 1990); Summers & Smith, supra; Miller et al. (1989). 

Recombinant baculovirus expression vectors have been developed for infection into several insect cells. For 
example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti , Autographa 

10 calif ornica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and Trichoplusia ni (WO 
89/046699; Carbonell et al., (1985) J. Virol 55:153; Wright (1986) Nature 327:718; Smith et al, (1983) 
Mol Cell Biol 3:2156; and see generally, Fraser, et al (1989) In Vitro Cell Dev. Biol 25:225). 
Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally known to 

15 those skilled in the art. See, eg. Summers & Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for stable 
maintenance of the plasmid(s) present in the modified insect host. Where the expression product gene is 
under inducible control, the host may be grown to high density, and expression induced. Alternatively, 
where expression is constitutive, the product will be continuously expressed into the medium and the 

20 nutrient medium must be continuously circulated, while removing the product of interest and augmenting 
depleted nutrients. The product may be purified by such techniques as chromatography, eg. HPLC, affinity 
chromatography, ion exchange chromatography, etc.; electrophoresis; density gradient centrifugation; 
solvent extraction, etc. As appropriate, the product may be further purified, as required, so as to remove 
substantially any insect proteins which are also present in the medium, so as to provide a product which is at 

25 least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

In order to obtain protein expression, recombinant host cells derived from the transformants are incubated 
under conditions which allow expression of the recombinant protein encoding sequence. These conditions 
will vary, dependent upon the host cell selected. However, the conditions are readily ascertainable to those 
of ordinary skill in the art, based upon what is known in the art. 

30 iii. Plant Systems 

There are many plant cell culture and whole plant genetic expression systems known in the art. Exemplary 
plant cellular genetic expression systems include those described in patents, such as: US 5,693,506; US 
5,659,122; and US 5,608,143. Additional examples of genetic expression in plant cell culture has been 
described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions of plant protein signal peptides may 
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be found in addition to the references described above in Vaulcombe et ah, Mol Gen. Genet. 209:33-40 
(1987); Chandler et al, Plant Molecular Biology 3:407-418 (1984); Rogers, J. Biol Chem. 260:3731-3738 
(1985); Rothstein et al, Gene 55:353-356 (1987); Whittier et al., Nucleic Acids Research 15:2515-2535 
(1987); Wirsel et al., Molecular Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A 
5 description of the regulation of plant gene expression by the phytohormone, gibberellic acid and secreted 
enzymes induced by gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: 
Advanced Plant Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21- 
52. References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027-1038(1990); 
Maas et aL, EMBO J. 9:3447-3452 (1990); Benkel and Hickey, Proa Natl Acad. Set 84:1337-1339 (1987). 

10 Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
expression cassette comprising genetic regulatory elements designed for operation in plants. The expression 
cassette is inserted into a desired expression vector with companion sequences upstream and downstream 
from the expression cassette suitable for expression in a plant host. The companion sequences will be of 
plasmid or viral origin and provide necessary characteristics to the vector to permit the vectors to move 

15 DNA from an original cloning host, such as bacteria, to the desired plant host. The basic bacterial/plant 
vector construct will preferably provide a broad host range prokaryote replication origin; a prokaryote 
selectable marker; and, for Agrobacterium transformations, T DNA sequences for Agrobacterium-mediated 
transfer to plant chromosomes. Where the heterologous gene is not readily amenable to detection, the 
construct will preferably also have a selectable marker gene suitable for determining if a plant cell has been 

20 transformed. A general review of suitable markers, for example for the members of the grass family, is 
found in Wilmink and Dons, 1993, Plant Mol Biol Reptr, 1 1(2):165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome are also 
recommended. These might include transposon sequences and the like for homologous recombination as 
well as Ti sequences which permit random insertion of a heterologous expression cassette into a plant 
25 genome. Suitable prokaryote selectable markers include resistance toward antibiotics such as ampicillin or 
tetracycline. Other DNA sequences encoding additional functions may also be present in the vector, as is 
known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette for 
expression of the protein(s) of interest. Usually, there will be only one expression cassette, although two or 
30 more are feasible. The recombinant expression cassette will contain in addition to the heterologous protein 
encoding sequence the following elements, a promoter region, plant 5' untranslated sequences, initiation 
codon depending upon whether or not the structural gene comes equipped with one, and a transcription and 
translation termination sequence. Unique restriction enzyme sites at the 5' and 3' ends of the cassette allow 
for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The sequence 
encoding the protein of interest will encode a signal peptide which allows processing and translocation of 
the protein, as appropriate, and will usually lack any sequence which might result in the binding of the 
desired protein of the invention to a membrane. Since, for the most part, the transcriptional initiation region 

5 will be for a gene which is expressed and translocated during germination, by employing the signal peptide 
which provides for translocation, one may also provide for translocation of the protein of interest. In this 
way, the protein(s) of interest will be translocated from the cells in which they are expressed and may be 
efficiently harvested. Typically secretion in seeds are across the aleurone or scutellar epithelium layer into 
the endosperm of the seed. While it is not required that the protein be secreted from the cells in which the 

1 0 protein is produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable to 
determine whether any portion of the cloned gene contains sequences which will be processed out as introns 
by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" region may be 
conducted to prevent losing a portion of the genetic message as a false intron code, Reed and Maniatis, Cell 

15 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically transfer the 
recombinant DNA. Crossway, Mol. Gen. Genet, 202:179-185, 1985. The genetic material may also be 
transferred into the plant cell by using polyethylene glycol, Krens, et al., Nature, 296, 72-74, 1982. Another 
method of introduction of nucleic acid segments is high velocity ballistic penetration by small particles with 

20 the nucleic acid either within the matrix of small beads or particles, or on the surface, Klein, et al., Nature, 
327, 70-73, 1987 and Knudsen and Muller, 1991, Planta, 185:330-336 teaching particle bombardment of 
barley endosperm to create transgenic barley. "Yet another method of introduction would be fusion of 
protoplasts with other entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, 
Fraley, et al, Proc. Natl. Acad Sci. USA, 79, 1859-1863, 1982. 

25 The vector may also be introduced into the plant cells by electroporation. (Fromm et al, Proc. Natl Acad. 
Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the presence of plasmids 
containing the gene construct. Electrical impulses of high field strength reversibly permeabilize 
biomembranes allowing the introduction of the plasmids. Electroporated plant protoplasts reform the cell 
wall, divide, and form plant callus. 

30 All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can be 
transformed by the present invention so that whole plants are recovered which contain the transferred gene. 
It is known that practically all plants can be regenerated from cultured cells or tissues, including but not 
limited to all major species of sugarcane, sugar beet, cotton, fruit and other trees, legumes and vegetables. 
Some suitable plants include, for example, species from the genera Fragaria, Lotus, Medicago, Onobrychis, 
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Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, 
Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersion, Nicotiana, Solarium, Petunia, 
Digitalis, Majorana, Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, 
Nemesia, Pelargonium, Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, 
Glycine, Lolium, Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of transformed 
protoplasts containing copies of the heterologous gene is first provided. Callus tissue is formed and shoots 
may be induced from callus and subsequently rooted. Alternatively, embryo formation can be induced from 
the protoplast suspension. These embryos germinate as natural embryos to form plants. The culture media 
will generally contain various amino acids and hormones, such as auxin and cytokinins. It is also 
advantageous to add glutamic acid and proline to the medium, especially for such species as corn and 
alfalfa. Shoots and roots normally develop simultaneously. Efficient regeneration will depend on the 
medium, on the genotype, and on the history of the culture. If these three variables are controlled, then 
regeneration is fully reproducible and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or alternatively, the 
protein may be extracted from the whole plant. Where the desired protein of the invention is secreted into 
the medium, it may be collected. Alternatively, the embryos and embryoless-half seeds or other plant tissue 
may be mechanically disrupted to release any secreted protein between cells and tissues. The mixture may 
be suspended in a buffer solution to retrieve soluble proteins. Conventional protein isolation and 
purification methods will be then used to purify the recombinant protein. Parameters of time, temperature 
pH, oxygen, and volumes will be adjusted through routine methods to optimize expression and recovery of 
heterologous protein. 
iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence capable of 
binding bacterial RNA polymerase and initiating the downstream (3') transcription of a coding sequence 
(eg. structural gene) into mRNA. A promoter will have a transcription initiation region which is usually 
placed proximal to the 5' end of the coding sequence. This transcription initiation region usually includes an 
RNA polymerase binding site and a transcription initiation site. A bacterial promoter may also have a 
second domain called an operator, that may overlap an adjacent RNA polymerase binding site at which 
RNA synthesis begins. The operator permits negative regulated (inducible) transcription, as a gene 
repressor protein may bind the operator and thereby inhibit transcription of a specific gene. Constitutive 
expression may occur in the absence of negative regulatory elements, such as the operator. In addition, 
positive regulation may be achieved by a gene activator protein binding sequence, which, if present is 
usually proximal (5') to the RNA polymerase binding sequence. An example of a gene activator protein is 
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the catabolite activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli 
(E. coli) [Raibaud et ah (1984) Annu. Rev. Genet. 18:113], Regulated expression may therefore be either 
positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples 
5 include promoter sequences derived from sugar metabolizing enzymes, such as galactose, lactose (lac) 
[Chang et al. (1977) Nature 198:1056], and maltose. Additional examples include promoter sequences 
derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et ah (1980) Nuc. Acids Res. 8:4057; 
Yelverton et ah (1981) Nucl Acids Res. 9:731; US patent 4,738,921; EP-A-0036776 and EP-A-0121775]. 
The g-laotamase (bid) promoter system [Weissmann (1981) "The cloning of interferon and other mistakes." 
10 In Interferon 3 (ed. I. Gresser)], bacteriophage lambda PL [Shimatake et ah (1981) Nature 292:128] and T5 
[US patent 4,689,406] promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. For 
example, transcription activation sequences of one bacterial or bacteriophage promoter may be joined with 
the operon sequences of another bacterial or bacteriophage promoter, creating a synthetic hybrid promoter 

15 [US patent 4,551,433]. For example, the tac promoter is a hybrid trp4ac promoter comprised of both trp 
promoter and lac operon sequences that is regulated by the lac repressor [Amann et ah (1983) Gene 25:167; 
de Boer et ah (1983) Proc. Natl. Acad. Set 80:21]. Furthermore, a bacterial promoter can include naturally 
occurring promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and 
initiate transcription. A naturally occurring promoter of non-bacterial origin can also be coupled with a 

20 compatible RNA polymerase to produce high levels of expression of some genes in prokaryotes. The 
bacteriophage T7 RNA polymerase/promoter system is an example of a coupled promoter system [Studier 
et ah (1986) J. Mol. Biol. 189:113; Tabor et al. (1985) Proc Natl Acad. Sci. 82:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO-A-0 267 
851). 

25 In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for the 
expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the Shine- 

Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 nucleotides in length 

i 

located 3-11 nucleotides upstream of the initiation codon [Shine et ah (1975) Nature 254:34]. The SD 
sequence is thought to promote binding of mRNA to the ribosome by the pairing of bases between the SD 
30 sequence and the 3' and of E. coli 16S rRNA [Steitz et ah (1979) "Genetic signals and nucleotide sequences 
in messenger RNA." In Biological Regulation and Development: Gene Expression (ed. R.F. Goldberger)]. 
To express eukaryotic genes and prokaryotic genes with weak ribosome-binding site [Sambrook et ah 
(1989) "Expression of cloned genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual]. 
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A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked with the 
DNA molecule, in which case the first amino acid at the N~terminus will always be a methionine, which is 
encoded by the ATG start codon. If desired, methionine at the N-terminus may be cleaved from the protein 
by in vitro incubation with cyanogen bromide or by either in vivo on in vitro incubation with a bacterial 

5 methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the N- 
terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5 1 end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two amino acid 
sequences. For example, the bacteriophage lambda cell gene can be linked at the 5* terminus of a foreign 

10 gene and expressed in bacteria. The resulting fusion protein preferably retains a site for a processing 
enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene [Nagai et al (1984) Nature 
309:810]. Fusion proteins can also be made with sequences from the lacL [Jia et al (1987) Gene 60:191], 
trpE [Allen et al (1987) J. Biotechnol. 5:93; Makoff et al (1989) J. Gen. Microbiol 735:11], and Chey 
[EP-A-0 324 647] genes. The DNA sequence at the junction of the two amino acid sequences may or may 

15 not encode a cleavable site. Another example is a ubiquitin fusion protein. Such a fusion protein is made 
with the ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin specific 
processing-protease) to cleave the ubiquitin from the foreign protein. Through this method, native foreign 
protein can be isolated [Miller et al (1989) Bio/Technology 7:698]. 

Alternatively,, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules that 
20 encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion of the 
foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes a signal 
peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the cell. The 
protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic space, 
located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably there are 
25 processing sites, which can be cleaved either in vivo or in vitro encoded between the signal peptide 
fragment and the foreign gene. 

DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, such as 
the E. coli outer membrane protein gene (ompA) [Masui et al (1983), in: Experimental Manipulation of 
Gene Expression; Ghrayeb et al (1984) EMBO J. 3:2437] and the E. coli alkaline phosphatase signal 
30 sequence (phoA) [Oka et al (1985) Proc. Natl Acad. Sci. 52:7212]. As an additional example, the signal 
sequence of the alpha-amylase gene from various Bacillus strains can be used to secrete heterologous 
proteins from J5. subtilis [Palva et al (1982) Proc. Natl Acad. Sci. USA 79:5582; EP-A-0 244 042]. 
Usually, transcription termination sequences recognized by bacteria are regulatory regions located 3' to the 
translation stop codon, and thus together with the promoter flank the coding sequence. These sequences 
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direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. 
Transcription termination sequences frequently include DNA sequences of about 50 nucleotides capable of 
forming stem loop structures that aid in terminating transcription. Examples include transcription 
termination sequences derived from genes with strong promoters, such as the trp gene in E. coli as well as 
5 other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression constructs. 
Expression constructs are often maintained in a replicon, such as an extrachromosomal element {eg. 
plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will have a replication 

10 system, thus allowing it to be maintained in a prokaryotic host either for expression or for cloning and 
amplification. In addition, a replicon may be either a high or low copy number plasmid. A high copy 
number plasmid will generally have a copy number ranging from about 5 to about 200, and usually about 10 
to about 150. A host containing a high copy number plasmid will preferably contain at least about 10, and 
more preferably at least about 20 plasmids. Either a high or low copy number vector may be selected, 

1 5 depending upon the effect of the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to the bacterial chromosome 
that allows the vector to integrate. Integrations appear to result from recombinations between homologous 
DNA in the vector and the bacterial chromosome. For example, integrating vectors constructed with DNA 

20 from various Bacillus strains integrate into the Bacillus chromosome (EP-A- 0 127 328). Integrating vectors 
may also be comprised of bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow 
for the selection of bacterial strains that have been transformed. Selectable markers can be expressed in the 
bacterial host and may include genes which render bacteria resistant to drugs such as ampicillin, 
25 chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline [Davies et al (1978) Annu. Rev. 
Microbiol 32:469]. Selectable markers may also include biosynthetic genes, such as those in the histidine, 
tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation vectors. 
Transformation vectors are usually comprised of a selectable market that is either maintained in a replicon 
30 or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, have 
been developed for transformation into many bacteria. For example, expression vectors have been 
developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al (1982) Proc. Natl Acad. Set 
USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia coli [Shimatake et al 
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(1981) Nature 292:128; Amann et al (1985) Gene 40:183; Studier et al (1986) J. Mol Biol 189:113; EP- 
A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907] 5 Streptococcus cremoris [Powell et al (1988) Appl 
Environ, Microbiol 54:655]; Streptococcus lividans [Powell et al (1988) Appl Environ. Microbiol 
54:655], Streptomyces lividans [US patent 4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually include 
either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent cations and DMSO. 
DNA can also be introduced into bacterial cells by electroporation. Transformation procedures usually vary 
with the bacterial species to be transformed. See eg. [Masson et al (1989) FEMS Microbiol Lett. 60:273; 
Palva et al (1982) Proc. Natl Acad. Set USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 
84/04541, Bacillus], [Miller et al (1988) Proc. Natl Acad. Set 55:856; Wang et al (1990) J. Bacteriol 
772:949, Campylobacter], [Cohen et al (1973) Proc. Natl Acad. Sci. 69:2110; Dower et al (1988) Nucleic 
Acids Res. 16:6121; Kushner (1978) "An improved method for transformation of Escherichia coli with 
ColEl -derived plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al (1970) J. Mol. Biol 53:159; Taketo (1988) 
Biochim. Biophys. Acta 949:318; Escherichia], [Chassy et al (1987) FEMS Microbiol Lett. 44:113 
Lactobacillus]; [Fiedler et al (1988) Anal Biochem 170:38, Pseudomonas]; [Augustin et al (1990) FEMS 
Microbiol. Lett. 66:203, Staphylococcus], [Barany et al (1980) J. Bacteriol 144:698; Harlander (1987) 
"Transformation of Streptococcus lactis by electroporation, in: Streptococcal Genetics (ed. J. Ferretti and R. 
Curtiss III); Perry et al (1981) Infect. Immun. 32:1295; Powell et al (1988) Appl Environ. Microbiol. 
54:655; Somkuti etal (1987) Proc. 4th Evr. Cong. Biotechnology 7:412, Streptococcus]. 
v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any DNA 
sequence capable of binding yeast KNA polymerase and initiating the downstream (3') transcription of a 
coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiation region 
which is usually placed proximal to the 5' end of the coding sequence. This transcription initiation region 
usually includes an RNA polymerase binding site (the "TATA Box") and a transcription initiation site. A 
yeast promoter may also have a second domain called an upstream activator sequence (UAS), which, if 
present, is usually distal to the structural gene. The UAS permits regulated (inducible) expression. Constitu- 
tive expression occurs in the absence of a UAS. Regulated expression may be either positive or negative, 
thereby either enhancing or reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding enzymes in 
the metabolic pathway provide particularly useful promoter sequences. Examples include alcohol 
dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucose-6-phosphate isomerase, 
glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, phosphofructokinase, 3- 
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phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). The yeast PH05 gene, encoding 
acid phosphatase, also provides useful promoter sequences [Myanohara et al (1983) Proc. Natl Acad. Sci. 
USA 80:1]. 

In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation region of 
another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid promoters include 
the ADH regulatory sequence linked to the GAP transcription activation region (US Patent Nos. 4,876,197 
and 4,880,734). Other examples of hybrid promoters include promoters which consist of the regulatory 
sequences of either the ADH2, GAL4, GAL 10, OR PH05 genes, combined with the transcriptional 
activation region of a glycolytic enzyme gene such as GAP or PyK (EP-A-0 164 556). Furthermore, a yeast 
promoter can include naturally occurring promoters of non-yeast origin that have the ability to bind yeast 
RNA polymerase and initiate transcription. Examples of such promoters include, inter alia, [Cohen et ah 
(1980) Proc. Natl Acad. Sci. USA 77:1078; Henikoff et al (1981) Nature 253:835; Hollenberg et al (1981) 
Curr. Topics Microbiol Immunol 05:119; Hollenberg et al (1979) "The Expression of Bacterial Antibiotic 
Resistance Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al (1980) Gene 11:163; 
Panthier etal (1980) Curr. Genet. 2:109;]. 

A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly linked 
with the DNA molecule, in which case the first amino acid at the N-terminus of the recombinant protein 
will always be a methionine, which is encoded by the ATG start codon. If desired, methionine at the N- 
terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 
Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, baculovirus, 
and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal portion of an 
endogenous yeast protein, or other stable protein, is fused to the 5' end of heterologous coding sequences. 
Upon expression, this construct will provide a fusion of the two amino acid sequences. For example, the 
yeast or human superoxide dismutase (SOD) gene, can be linked at the 5' terminus of a foreign gene and 
expressed in yeast. The DNA sequence at the junction of the two amino acid sequences may or may not 
encode a cleavable site. See eg. EP-A-0 196 056. Another example is a ubiquitin fusion protein. Such a 
fusion protein is made with the ubiquitin region that preferably retains a site for a processing enzyme (eg. 
ubiquitin-specific processing protease) to cleave the ubiquitin from the foreign protein. Through this 
method, therefore, native foreign protein can be isolated (eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric 
DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provide for 
secretion in yeast of the foreign protein. Preferably, there are processing sites encoded between the leader 
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fragment and the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence fragment 
usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion of the 
protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, such as the 
yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US patent 4,588,684). 
Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that also provide for secretion 
in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor gene, which 
contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor fragments that can be 
employed include the full-length pre-pro alpha factor leader (about 83 amino acid residues) as well as 
truncated alpha-factor leaders (usually about 25 to about 50 amino acid residues) (US Patents 4,546,083 and 
4,870,008; EP-A-0 324 274). Additional leaders employing an alpha-factor leader fragment that provides 
for secretion include hybrid alpha-factor leaders made with a presequence of a first yeast, but a pro-region 
from a second yeast alphafactor. (eg. see WO 89/02463.) 

Usually, transcription termination sequences recognized by yeast are regulatory regions located 3' to the 
translation stop codon, and thus together with the promoter flank the coding sequence. These sequences 
direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. 
Examples of transcription terminator sequence and other yeast-recognized termination sequences, such as 
those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding sequence of 
interest, and transcription termination sequence, are put together into expression constructs. Expression 
constructs are often maintained in a replicon, such as an extrachromosomal element (eg. plasmids) capable 
of stable maintenance in a host, such as yeast or bacteria. The replicon may have two replication systems, 
thus allowing it to be maintained, for example, in yeast for expression and in a prokaryotic host for cloning 
and amplification. Examples of such yeast-bacteria shuttle vectors include YEp24 [Botstein et al (1979) 
Gene &17-24], pCl/1 [Brake et al (1984) Proc. Natl Acad. Sci USA 8i:4642-4646], and YRpl7 
[Stinchcomb et al (1982) J. Mol Biol 158:151]. In addition, a replicon may be either a high or low copy 
number plasmid. A high copy number plasmid will generally have a copy number ranging from about 5 to 
about 200, and usually about 10 to about 150. A host containing a high copy number plasmid will 
preferably have at least about 10, and more preferably at least about 20. Enter a high or low copy number 
vector may be selected, depending upon the effect of the vector and the foreign protein on the host. See eg. 
Brake et al, supra. 

Alternatively, the expression constructs can be integrated into the yeast genome with an integrating vector. 
Integrating vectors usually contain at least one sequence homologous to a yeast chromosome that allows the 
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vector to integrate, and preferably contain two homologous sequences flanking the expression construct. 
Integrations appear to result from recombinations between homologous DNA in the vector and the yeast 
chromosome [Orr- Weaver et al (1983) Methods in Enzymol 707:228-245]. An integrating vector may be 
directed to a specific locus in yeast by selecting the appropriate homologous sequence for inclusion in the 

5 vector. See Orr- Weaver et al 9 supra. One or more expression construct may integrate, possibly affecting 
levels of recombinant protein produced [Rine et al (1983) Proc. Natl Acad. Scl USA 80:6750]. The 
chromosomal sequences included in the vector can occur either as a single segment in the vector, which 
results in the integration of the entire vector, or two segments homologous to adjacent segments in the 
chromosome and flanking the expression construct in the vector, which can result in the stable integration 

10 of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow 
for the selection of yeast strains that have been transformed. Selectable markers may include biosynthetic 
genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2, TRP1, and ALG7, and the G418 
resistance gene, which confer resistance in yeast cells to tunicamycin and G418, respectively. In addition, a 

15 suitable selectable marker may also provide yeast with the ability to grow in the presence of toxic 
compounds, such as metal. For example, the presence of CUP1 allows yeast to grow in the presence of 
copper ions [Butt et al (1987) Microbiol Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation vectors. 
Transformation vectors are usually comprised of a selectable marker that is either maintained in a replicon 

20 or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, have been 
developed for transformation into many yeasts. For example, expression vectors have been developed for, 
inter alia, the following yeasts:Candida albicans [Kurtz, et al (1986) Mol Cell Biol 6:142], Candida 
maltosa [Kunze, et al (1985) J. Basic Microbiol 25:141]. Hansenula polymorphs [Gleeson, et al (1986) J. 

25 Gen. Microbiol 732:3459; Roggenkamp et al (1986) Mol Gen. Genet. 202:302], Kluyveromyces fragilis 
[Das, et al (1984) J. Bacteriol 755:1165], Kluyveromyces lactis [De Louvencourt et al (1983) J. 
Bacteriol 754:737; Van den Berg et al (1990) Bio/Technology 5:135], Pichia guillerimondii [Kunze et al 
(1985) J. Basic Microbiol 25:141], Pichia pastoris [Gregg, et al (1985) Mol Cell Biol. 5:3376; US Patent 
Nos. 4,837,148 and 4,929,555], Saccharomyces cerevisiae [Hinnen et al (1978) Proc. Natl. Acad. Scl USA 

30 75:1929; Ito et al (1983) J. Bacteriol 753:163], Schizosaccharomyces pombe [Beach and Nurse (1981) 
Nature 300:706], and Yarrowia lipolytica [Davidow, et al (1985) Curr. Genet. 70:380471 Gaillardin, et al 
(1985) Curr. Genet. 70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually include 
either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. Transformation 
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procedures usually vary with the yeast species to be transformed. See eg. [Kurtz et al (1986) Mol Cell 
Biol 6:142; Kunze et al (1985) J. Basic Microbiol 25:141; Candida]; [Gleeson et al (1986) J. Gem 
Microbiol 732:3459; Roggenkamp et al (1986) Mol Gen. Genet 202:302; Hansenula]; [Das et al (1984) 
J. Bacteriol 755:1165; De Louvencourt et al (1983) J. Bacteriol 154:1165; Van den Berg et al (1990) 
5 Bio/Technology 5:135; Kluyveromyces]; [Cregg et al (1985) Mol Cell Biol 5:3376; Kunze et al (1985) J. 
Basic Microbiol 25:141; US Patent Nos. 4,837,148 and 4,929,555; Pichia]; [ffinnen et al (1978) Proc. 
Natl Acad. Set USA 75;1929; Ito et al (1983) J. Bacteriol 153:163 Saccharomyces]; [Beach and Nurse 
(1981) Nature 300:706; Schizosaccharomyces]; [Davidow et al (1985) Curr. Genet. 10:39; Gaillardin et al 
(1985) Curr. Genet. 10:49; Yarrowia]. 
10 Antibodies 

As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of at least 
one antibody combining site. An "antibody combining site" is the three-dimensional binding space with an 
internal surface shape and charge distribution complementary to the features of an epitope of an antigen, 
which allows a binding of the antibody with the antigen. "Antibody" includes, for example, vertebrate 
15 antibodies, hybrid antibodies, chimeric antibodies, humanised antibodies, altered antibodies, univalent 
antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, immunoassays, and 
distinguishing/identifying Streptococcal proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
20 conventional methods. In general, the protein is first used to immunize a suitable animal, preferably a 
mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera due to the 
volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies. 
Immunization is generally performed by mixing or emulsifying the protein in saline, preferably in an 
adjuvant such as Freund's complete adjuvant, and injecting the mixture or emulsion parenterally (generally 
25 subcutaneously or intramuscularly). A dose of 50-200 |ig/injection is typically sufficient. Immunization is 
generally boosted 2-6 weeks later with one or more injections of the protein in saline, preferably using 
Freund's incomplete adjuvant. One may alternatively generate antibodies by in vitro immunization using 
methods known in the art, which for the purposes of this invention is considered equivalent to in vivo 
immunization. Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic 
30 container, incubating the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The 
serum is recovered by centrifligation (eg. l,000g for 10 minutes). About 20-50 ml per bleed may be 
obtained from rabbits. 
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Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature (1975) 
256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described above. 
However, rather than bleeding the animal to extract serum, the spleen (and optionally several large lymph 
nodes) is removed and dissociated into single cells. If desired, the spleen cells may be screened (after 
removal of nonspecifically adherent cells) by applying a cell suspension to a plate or well coated with the 
protein antigen. B-cells expressing membrane-bound immunoglobulin specific for the antigen bind to the 
plate, and are not rinsed away with the rest of the suspension. Resulting B-cells, or all dissociated spleen 
cells, are then induced to fuse with myeloma cells to form hybridomas, and are cultured in a selective 
medium (eg hypoxanthine, aminopterin, thymidine medium, "HAT"). The resulting hybridomas are plated 
by limiting dilution, and are assayed for production of antibodies which bind specifically to the immunizing 
antigen (and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 
cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice). 
If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chrqmophores, radioactive atoms (particularly 32 P and 
125 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes are typically 
detected by their activity. For example, horseradish peroxidase is usually detected by its ability to convert 
3,3^5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a spectrophotometer. "Specific 
binding partner" refers to a protein capable of binding a ligand molecule with high specificity, as for 
example in the case of an antigen and a monoclonal antibody specific therefor. Other specific binding 
partners include biotin and avidin or streptavidin, IgG and protein A, and the numerous receptor-ligand 
couples known in the art. It should be understood that the above description is not meant to categorize the 
various labels into distinct classes, as the same label may serve in several different modes. For example, 125 I 
may serve as a radioactive label or as an electron-dense reagent. HRP may serve as enzyme or as antigen for 
a MAb. Further, one may combine various labels for desired effect. For example, MAbs and avidin also 
require labels in the practice of this invention: thus, one might label a MAb with biotin, and detect its 
presence with avidin labeled with 125 I, or with an anti-biotin MAb labeled with HRP. Other permutations 
and possibilities will be readily apparent to those of ordinary skill in the art, and are considered as 
equivalents within the scope of the instant invention. 
Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the invention. 
The pharmaceutical compositions will comprise a therapeutically effective amount of either polypeptides, 
antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic agent to 
treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable therapeutic or 
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preventative effect. The effect can be detected by, for example, chemical markers or antigen levels. 
Therapeutic effects also include reduction in physical symptoms, such as decreased body temperature. The 
precise effective amount for a subject will depend upon the subject's size and health, the nature and extent 
of the condition, and the therapeutics or combination of therapeutics selected for administration. Thus, it is 

5 not useful to specify an exact effective amount in advance. However, the effective amount for a given 
situation can be determined by routine experimentation and is within the judgement of the clinician. 
For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg or 0.05 
mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 
A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 

10 "pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such as 
antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any pharmaceutical 
carrier that does not itself induce the production of antibodies harmful to the individual receiving the 
composition, and which may be administered without undue toxicity. Suitable carriers may be large, slowly 
metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, 

15 polymeric amino acids, amino acid copolymers, and inactive virus particles. Such carriers are well known to 
those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids such as 
acetates, propionates, malonates, benzoates, and the like. A thorough discussion of pharmaceutically 

20 acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack Pub. Co., N J. 1991). 

Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, saline, 
glycerol and ethanol. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH 
buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic compositions 
are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable for solution in, or 

25 suspension in, liquid vehicles prior to injection may also be prepared. Liposomes are included within the 
definition of a pharmaceutically acceptable carrier. 
Delivery Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The subjects 
to be treated can be animals; in particular, human subjects can be treated. 
30 Direct delivery of the compositions will generally be accomplished by injection, either subcutaneously, 
intraperitoneally, intravenously or intramuscularly or delivered to the interstitial space of a tissue. The 
compositions can also be administered into a lesion. Other modes of administration include oral and 
pulmonary administration, suppositories, and transdermal or transcutaneous applications (eg. see 
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WO98/20734), needles, and gene guns or hyposprays. Dosage treatment may be a single dose schedule or a 
multiple dose schedule* 

See also Delivery Strategies for Antisense Oligonucleotide Therapeutics (ed. Akhtar) ISBN 0849347785. 
Vaccines 

5 Vaccines according to the invention may either be prophylactic {ie. to prevent infection) or therapeutic {ie. 
to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, 
usually in combination with "pharmaceutical^ acceptable carriers" which include any carrier that does not 
itself induce the production of antibodies harmful to the individual receiving the composition. Suitable 

10 carriers are typically large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, lipid aggregates (such 
as oil droplets or liposomes), and inactive virus particles. Such carriers are well known to those of ordinary 
skill in the art. Additionally, these carriers may function as immunostimulating agents ("adjuvants"). 
Furthermore, the antigen or immunogen may be conjugated to a bacterial toxoid, such as a toxoid from 

15 diphtheria, tetanus, cholera, R pylori, etc. pathogens. 

Vaccines of the invention may be administered in conjunction with other immunoregulatory 
agents. In particular, compositions will usually include an adjuvant. 

Preferred further adjuvants include, but are not limited to, one or more of the following set forth 
below: 

20 A. Mineral Containing Compositions 

Mineral containing compositions suitable for use as adjuvants in the invention include mineral 
salts, such as aluminium salts and calcium salts. The invention includes mineral salts such as 
hydroxides {e.g. oxyhydroxides), phosphates {e.g. hydroxyphoshpates, orthophosphates), 
sulphates, etc. {e.g. see chapters 8 & 9 of ref. 1}), or mixtures of different mineral compounds, 
25 with the compounds taking any suitable form {e.g. gel, crystalline, amorphous, etc.), and with 
adsorption being preferred. The mineral containing compositions may also be formulated as a 
particle of metal salt. See ref. 2. 

B. Oil-Emulsions 

Oil-emulsion compositions suitable for use as adjuvants in the invention include squalene-water 
30 emulsions, such as MF59 (5% Squalene, 0.5% Tween 80, and 0.5% Span 85, formulated into 
submicron particles using a microfluidizer). See ref. 3. 

Complete Freund's adjuvant (CFA) and incomplete Freund's adjuvant (IF A) may also be used as 
adjuvants in the invention. 
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C. Saponin Formulations 

Saponin formulations, may also be used as adjuvants in the invention. Saponins are a 
heterologous group of sterol glycosides and triterpenoid glycosides that are found in the bark, 
leaves, stems, roots and even flowers of a wide range of plant species. Saponin from the bark of 
5 the Quillaia saponaria Molina tree have been widely studied as adjuvants. Saponin can also be 
commercially obtained from Smilax ornata (sarsaprilla), Gypsophilla paniculata (brides veil), 
and Saponaria officianalis (soap root). Saponin adjuvant formulations include purified 
formulations, such as QS21, as well as lipid formulations, such as ISCOMs. 

Saponin compositions have been purified using High Performance Thin Layer Chromatography 
10 (HP-LC) and Reversed Phase High Performance Liquid Chromatography (RP-HPLC). Specific 
purified fractions using these techniques have been identified, including QS7, QS17, QS18, 
QS21, QH-A, QH-B and QH-C. Preferably, the saponin is QS21. A method of production of 
QS21 is disclosed in U.S. Patent No. 5,057,540. Saponin formulations may also comprise a 
sterol, such as cholesterol (see WO 96/33739). 

15 Combinations of saponins and cholesterols can be used to form unique particles called 
Immunostimulating Complexs (ISCOMs). ISCOMs typically also include a phospholipid such 
as phosphatidylethanolamine or phosphatidylcholine. Any known saponin can be used in 
ISCOMs. Preferably, the ISCOM includes one or more of Quil A, QHA and QHC. ISCOMs are 
further described in EP 0 109 942, WO 96/11711 and WO 96/33739. Optionally, the ISCOMS 

20 may be devoid of additional detergent. See ref. 4. 

A review of the development of saponin based adjuvants can be found at ref. 5. 

C. Virosomes and Virus Like Particles (VLPs) 

Virosomes and Virus Like Particles (VLPs) can also be used as adjuvants in the invention. 
These structures generally contain one or more proteins from a virus optionally combined or 

25 formulated with a phospholipid. They are generally non-pathogenic, non-replicating and 
generally do not contain any of the native viral genome. The viral proteins may be recombinantly 
produced or isolated from whole viruses. These viral proteins suitable for use in virosomes or 
VLPs include proteins derived from influenza virus (such as HA or NA), Hepatitis B virus (such 
as core or capsid proteins), Hepatitis E virus, measles virus, Sindbis virus, Rotavirus, Foot-and- 

30 Mouth Disease virus, Retrovirus, Norwalk virus, human Papilloma virus, HIV, RNA-phages, 
QB-phage (such as coat proteins), GA-phage, fr-phage, AP205 phage, and Ty (such as 
retrotransposon Ty protein pi). VLPs are discussed further in WO 03/024480, WO 03/024481, 
and Refs. 6, 7, 8 and 9. Virosomes are discussed further in, for example, Ref. 10 

D. Bacterial or Microbial Derivatives 

66 



WO 2004/018646 



PCT/US2003/026827 



Adjuvants suitable for use in the invention include bacterial or microbial derivatives such as: 

(1) Non-toxic derivatives of enterobacterial lipopolysaccharide (LPS) 

Such derivatives include Monophosphoryl lipid A (MPL) and 3-O-deacylated MPL (3dMPL). 
3dMPL is a mixture of 3 De-O-acylated monophosphoryl lipid A with 4, 5 or 6 acylated chains. 
A preferred "small particle" form of 3 De-O-acylated monophosphoryl lipid A is disclosed in EP 
0 689 454. Such "small particles" of 3dMPL are small enough to be sterile filtered through a 
0.22 micron membrane (see EP 0 689 454). Other non-toxic LPS derivatives include 
monophosphoryl lipid A mimics, such as aminoalkyl glucosaminide phosphate derivatives e.g. 
RC-529. SeeRef. 11. 

(2) Lipid A Derivatives 

Lipid A derivatives include derivatives of lipid A from Escherichia coli such as OM-174. OM- 
174 is described for example in Ref. 12 and 13. 

(3) Immunostimulatory oligonucleotides 

Immunostimulatory oligonucleotides suitable for use as adjuvants in the invention include 
nucleotide sequences containing a CpG motif (a sequence containing an unmethylated cytosine 
followed by guanosine and linked by a phosphate bond). Bacterial double stranded RNA or 
oligonucleotides containing palindromic or poly(dG) sequences have also been shown to be 
inmiunostimulatory. 

The CpG's can include nucleotide modifications/analogs such as phosphorothioate modifications 
and can be double-stranded or single-stranded. Optionally, the guanosine may be replaced with 
an analog such as 2'-deoxy-7-deazaguanosine. See ref. 14, WO 02/26757 and WO 99/62923 for 
examples of possible analog substitutions. The adjuvant effect of CpG oligonucleotides is further 
discussed in Refs. 15, 16, WO 98/40100, U.S. Patent No. 6,207,646, U.S. Patent No. 6,239,116, 
and U.S. Patent No. 6,429,199. 

The CpG sequence may be directed to TLR9, such as the motif GTCGTT or TTCGTT. See ref. 
17. The CpG sequence may be specific for inducing a Thl immune response, such as a CpG-A 
ODN, or it may be more specific for inducing a B cell response, such a CpG-B ODN. CpG-A 
and CpG-B ODNs are discussed in refs. 18, 19 and WO 01/95935. Preferably, the CpG is a CpG- 
A ODN. 

Preferably, the CpG oligonucleotide is constructed so that the 5 5 end is accessible for receptor 
recognition. Optionally, two CpG oligonucleotide sequences may be attached at their 3' ends to 
form "immunomers". See, for example, refs. 20, 21, 22 and WO 03/035836. 

(4) ADP-ribosylating toxins and detoxified derivatives thereof 
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Bacterial ADP-ribosylating toxins and detoxified derivatives thereof may be used as adjuvants in 
the invention. Preferably, the protein is derived from E. coli (i.e., E. coli heat labile enterotoxin 
"LT), cholera ("CT"), or pertussis ("PT"). The use of detoxified ADP-ribosylating toxins as 
mucosal adjuvants is described in WO 95/17211 and as parenteral adjuvants in WO 98/42375. 
The toxin or toxoid is preferably in the form of a holotoxin, comprising both A and B subunits. 
Preferably, the A subunit contains a detoxifying mutation; preferably the B subunit is not 
mutated. Preferably, the adjuvant is a detoxified LT mutant such as LT-K63, LT-R72, and 
LTR192G. The use of ADP-ribosylating toxins and detoxified derivaties thereof, particularly 
LT-K63 and LT-R72, as adjuvants can be found in Refs. 23, 24, 25, 26, 27, 28, 29 and 30 each 
of which is specifically incorporated by reference herein in their entirety. Numerical reference 
for amino acid substitutions is preferably based on the alignments of the A and B subunits of 
ADP-ribosylating toxins set forth in Domenighini et al., Mol. Microbiol (1995) 15(6):1165 - 
1 167, specifically incorporated herein by reference in its entirety. 

E. Human hnmunomodulators 

Human immunomodulators suitable for use as adjuvants in the invention include cytokines, such 
as interleukins (e.g. IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons (e.g. interferon-?), 
macrophage colony stimulating factor, and tumor necrosis factor. 

F. Bioadhesives and Mucoadhesives 

Bioadhesives and mucoadhesives may also be used as adjuvants in the invention. Suitable 
bioadhesives include esterified hyaluronic acid microspheres (Ref. 31) or mucoadhesives such as 
cross-linked derivatives of poly(acrylic acid), polyvinyl alcohol, polyvinyl pyrollidone, 
polysaccharides and carboxymethylcellulose. Chitosan and derivatives thereof may also be used 
as adjuvants in the invention. E.g., ref. 32. 

G. Microparticles 

Microparticles may also be used as adjuvants in the invention. Microparticles (i.e. a particle of 
-lOOnm to ~150um in diameter, more preferably ~200nm to ~30um in diameter, and most 
preferably ~500nm to ~10um in diameter) formed from materials that are biodegradable and 
non-toxic (e.g. a poly(a-hydroxy acid), a polyhydroxybutyric acid, a polyorthoester, a 
polyanhydride, a polycaprolactone, etc.), with poly(lactide-co-glycolide) are preferred, 
optionally treated to have a negatively-charged surface (e.g. with SDS) or a positively-charged 
surface (e.g. with a cationic detergent, such as CTAB). 

H. Liposomes 

Examples of liposome formulations suitable for use as adjuvants are described in U.S. Patent No. 
6,090,406, U.S. Patent No. 5,916,588, and EP 0 626 169. 
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I. Polyoxvethvlene ether and Polvoxvethvlene Ester Formulations 

Adjuvants suitable for use in the invention include polyoxyethylene ethers and polyoxyethylene 
esters. Ref. 33. Such formulations further include polyoxyethylene sorbitan ester surfactants in 
combination with an octoxynol (Ref. 34) as well as polyoxyethylene alkyl ethers or ester 
surfactants in combination with at least one additional non-ionic surfactant such as an octoxynol 
(Ref. 35). 

Preferred polyoxyethylene ethers are selected from the following group: polyoxyethylene-9- 
lauryl ether (laureth 9), polyoxyethylene-9-steoryl ether, polyoxytheylene-8-steoryl ether, 
polyoxyethylene-4-lauryl ether, polyoxyethylene-35-lauryl ether, and polyoxyethylene-23-lauryl 
ether. 

J. Polvphosphazene (PCPP) 

PCPP formulations are described, for example, in Ref. 36 and 37. 
K. Muramvl peptides 

Examples of muramyl peptides suitable for use as adjuvants in the invention include N-acetyl- 
muramyl-L-tm-eonyl-D-isoglutannne(thr-MDP), N-acetyl-nonnuramyl-L-alanyl-D-isoglutamine 
(nor-MDP), and N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(r-2'-dipalmitoyl-5n- 
glycero-3-hydroxyphosphoryloxy)-ethylamineMTP-PE). 

L. Imidazoquinolone Compounds . 

Examples of imidazoquinolone compounds suitable for use adjuvants in the invention include 
Imiquamod and its homologues, described further in Ref. 38 and 39. 

The invention may also comprise combinations of aspects of one or more of the adjuvants 
identified above. For example, the following adjuvant compositions may be used in the 
invention: 

(1) a saponin and an oil-in-water emulsion (ref. 40); 

(2) a saponin (e.g.., QS21) + a non-toxic LPS derivative (e.g., 3dMPL) (see WO 
94/00153); 

(3) a saponin (e.g.., QS21) + a non-toxic LPS derivative (e.g., 3dMPL) + a 
cholesterol; 

(4) a saponin (e.g. QS21) + 3dMPL + IL-12 (optionally + a sterol) (Ref. 41); 
combinations of 3dMPL with, for example, QS21 and/or oil-in-water emulsions (Ref. 42); 



69 



WO 2004/018646 



PCT/US2003/026827 



(5) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic-block polymer 
L121, and thr-MDP, either microfluidized into a submicron emulsion or vortexed to generate a 
larger particle size emulsion. 

(6) Ribi™ adjuvant system (RAS), (Ribi Immunochem) containing 2% Squalene, 
0.2% Tween 80, and one or more bacterial cell wall components from the group consisting of 
monophosphorylipid A (MPL), trehalose dimycolate (TDM), and cell wall skeleton (CWS), 
preferably MPL + CWS (Detox™); and 

(7) one or more mineral salts (such as an aluminum salt) + a non-toxic derivative of 
LPS (such as 3dPML). 

Aluminium salts and MF59 are preferred adjuvants for parenteral immunisation. Mutant bacterial 
toxins are preferred mucosal adjuvants. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypeptide/protein/ nucleic acid, 
pharmaceutical^ acceptable carrier, and adjuvant) typically will contain diluents, such as water, saline, 
glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH 
buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection may also 
be prepared. The preparation also may be emulsified or encapsulated in liposomes for enhanced adjuvant 
effect, as discussed above under pharmaceutically acceptable carriers. 

Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, as 
needed. By "immunologically effective amount", it is meant that the administration of that amount to an 
individual, either in a single dose or as part of a series, is effective for treatment or prevention. This amount 
varies depending upon the health and physical condition of the individual to be treated, the taxonomic group 
of individual to be treated (eg. nonhuman primate, primate, etc.), the capacity of the individual's immune 
system to synthesize antibodies, the degree of protection desired, the formulation of the vaccine, the treating 
doctor's assessment of the medical situation, and other relevant factors. It is expected that the amount will 
fall in a relatively broad range that can be detennined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, either subcu- 
taneously, intramuscularly, or transdermally/transcutaneously (eg WO98/20734). Additional formulations 
suitable for other modes of administration include oral and pulmonary formulations, suppositories, and 
transdermal applications. Dosage treatment may be a single dose schedule or a multiple dose schedule. The 
vaccine may be administered in conjunction with other immunoregulatory agents. 
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As an alternative to protein-based vaccines, DNA vaccination may be used [eg. Robinson & Torres (1997) 
Seminars in Immunol 9:271-283; Donnelly et al. (1997) Annu Rev Immunol 15:617-648; later herein]. 
Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of the 
invention, to be delivered to the mammal for expression in the mammal, can be administered either locally 
or systemically. These constructs can utilize viral or non-viral vector approaches in in vivo or ex vivo 
modality. Expression of such coding sequence can be induced using endogenous mammalian or 
heterologous promoters. Expression of the coding sequence in vivo can be either constitutive or regulated. 
The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can also be an 
astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, picornavirus, poxvirus, 
or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 1:51-64; Kimura (1994) Human 
Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 6:185-193; and Kaplitt (1994) Nature 
Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector is 
employable in the invention, including B, C and D type retroviruses, xenotropic retroviruses (for example, 
NZB-X1, NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol. 53:160) polytropic retroviruses eg. MCF and 
MCF-MLV (see Kelly (1983) J. Virol 45:291), spumaviruses and lentiviruses. See RNA Tumor Viruses, 
Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For example, 
retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site from a Rous Sarcoma 
Virus, a packaging signal from a Murine Leukemia Virus, and an origin of second strand synthesis from an 
Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral vector 
particles by introducing them into appropriate packaging cell lines (see US patent 5,591,624). Retrovirus 
vectors can be constructed for site-specific integration into host cell DNA by incorporation of a chimeric 
integrase enzyme into the retroviral particle (see W096/37626). It is preferable that the recombinant viral 
vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known in the art, 
are readily prepared (see WO95/30763 and WO92/05266), and can be used to create producer cell lines 
(also termed vector cell lines or "VCLs") for the production of recombinant vector particles. Preferably, the 
packaging cell lines are made from human parent cells (eg. HT1080 cells) or mink parent cell lines, which 
eliminates inactivation in human serum. 
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Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian Leukosis Virus, 
Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing Virus, Murine Sarcoma 
Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly preferred Murine Leukemia 
Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 19:19-25), Abelson (ATCC No. 
VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol VR-590), Kirsten, Harvey Sarcoma Virus 
and Rauscher (ATCC No. VR-998) and Moloney Murine Leukemia Virus (ATCC No. VR-190). Such 
retroviruses may be obtained from depositories or collections such as the American Type Culture Collection 
("ATCC") in Rockville, Maryland or isolated from known sources using commonly available techniques. 
Exemplary known retroviral gene therapy vectors employable in this invention include those described in 
patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; WO89/05349, 
WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, W093/25234, WO93/11230, 
WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 5,219,740, US 4,405,712, US 4,861,719, US 
4,980,289, US 4,777,127, US 5,591,624. See also Vile (1993) Cancer Res 53:3860-3864; Vile (1993) 
Cancer Res 53:962-967; Ram (1993) Cancer Res 53 (1993) 83-88; Takamiya (1992) J Neurosci Res 
33:493-503; Baba (1993) JNeurosurg 79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad 
Sci 81:6349; and Miller (1990) Human Gene Therapy 1. 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. See, for 
example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and WO93/07283, 
WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors employable in this 
invention include those described in the above referenced documents and in W094/12649, WO93/03769, 
W093/19191, W094/28938, W095/11984, WO95/00655, WO95/27071, W095/29993, W095/34671, 
WO96/05320, WO94/08026, WO94/11506, WO93/06223, W094/24299, WO95/14102, W095/24297, 
WO95/02697, W094/28152, W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and 
WO95/09654. Alternatively, administration of DNA linked to killed adenovirus as described in Curiel 
(1992) Hum. Gene Ther, 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such vectors for use 
in this invention are the AAV-2 based vectors disclosed in Srivastava, WO93/09239. Most preferred AAV 
vectors comprise the two AAV inverted terminal repeats in which the native D-sequences are modified by 
substitution of nucleotides, such that at least 5 native nucleotides and up to 18 native nucleotides, preferably 
at least 10 native nucleotides up to 18 native nucleotides, most preferably 10 native nucleotides are retained 
and the remaining nucleotides of the D-sequence are deleted or replaced with non-native nucleotides. The 
native D-sequences of the AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in 
each AAV inverted terminal repeat (ie. there is one sequence at each end) which are not involved in HP 
formation. The non-native replacement nucleotide may be any nucleotide other than the nucleotide found in 
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the native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of such an 
AAV vector is psub201 (see Samulski (1987) Virol 61:3096). Another exemplary AAV vector is the 
Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US Patent 5,478,745. Still 
5 other vectors are those disclosed in Carter US Patent 4,797,368 and Muzyczka US Patent 5,139,941, 
Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a further example of an AAV vector 
employable in this invention is SSV9AFABTKneo, which contains the AFP enhancer and albumin 
promoter and directs expression predominantly in the liver. Its structure and construction are disclosed in Su 
(1996) Human Gene Therapy 7:463-470. Additional AAV gene therapy vectors are described in US 

10 5,354,678, US 5,173,414, US 5,139,941, and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred examples are 
herpes simplex virus vectors containing a sequence encoding a thymidine kinase polypeptide such as those 
disclosed in US 5,288,641 and EP0176170 (Roizman). Additional exemplary herpes simplex virus vectors 
include HFEM/ICP6-LacZ disclosed in WO95/04139 (Wistar Institute), pHSVlac described in Geller 

15 (1988) Science 241:1667-1669 and in WO90/09441 and WO92/07945, HSV Us3::pgC-lacZ described in 
Fink (1992) Human Gene Therapy 3:11-19 and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 
(Breakefield), and those deposited with the ATCC with accession numbers VR-977 and VR-260. 
Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. Preferred 
alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC VR-67; ATCC 

20 VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; ATCC VR-1246), 
Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC VR-1249; ATCC VR-532), 
and those described in US patents 5,091,309, 5,217,879, and WO92/10578. More particularly, those alpha 
virus vectors described in US Serial No. 08/405,627, filed March 15, 1995,W094/21792, WO92/10578, 
WO95/07994, US 5,091,309 and US 5,217,879 are employable. Such alpha viruses may be obtained from 

25 depositories or collections such as the ATCC in Rockville, Maryland or isolated from known sources using 
commonly available techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see 
USSN 08/679640). 

DNA vector systems such as eukaryotic layered expression systems are also useful for expressing the 
nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered expression 
30 systems. Preferably, the eukaryotic layered expression systems of the invention are derived from alphavirus 
vectors and most preferably from Sindbis viral vectors. 

Other viral vectors suitable for use in the present invention include those derived from poliovirus, for 
example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) J. Biol 
Standardization 1:115; rhinovirus, for example ATCC VR-1110 and those described in Arnold (1990) J 
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Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC VR-1 1 1 and 
ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 86:317; Flexner (1989) Ann 
NYAcadSci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,112 and US 4,769,330 and WO89/01973; 
SV40 virus, for example ATCC VR-305 and those described in Mulligan (1979) Nature 277:108 and 
5 Madzak (1992) J Gen Virol 73:1533; influenza virus, for example ATCC VR-797 and recombinant 
influenza viruses made employing reverse genetics techniques as described in US 5,166,057 and in Enami 
(1990) Proc Natl Acad Sci 87:3802-3805; Enami & Palese (1991) J Virol 65:271 1-2713 and Luytjes (1989) 
Cell 59:110, (see also McMichael (1983) NEJ Med 309:13, and Yap (1978) Nature 273:238 and Nature 
(1979) 277:108); human immunodeficiency virus as described inEP-0386882 and in Buchschacher (1992) 

10 J. Virol. 66:2731; measles virus, for example ATCC VR-67 and VR-1247 and those described in EP- 
0440219; Aura virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 and ATCC 
VR-1240; Cabassou virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and 
ATCC VR-1241; Fort Morgan Virus, for example ATCC VR-924; Getah virus, for example ATCC VR-369 
and ATCC VR-1 243; Kyzylagach virus, for example ATCC VR-927; Mayaro virus, for example ATCC 

15 VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1244; Ndumu virus, for example 
ATCC VR-371; Pixuna virus, for example ATCC VR-372 and ATCC VR-1245; Tonate virus, for example 
ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for example ATCC VR-374; 
Whataroa virus, for example ATCC VR-926; Y-62-33 virus, for example ATCC VR-375; O'Nyong virus, 
Eastern encephalitis virus, for example ATCC VR-65 and ATCC VR-1242; Western encephalitis virus, for 

20 example ATCC VR-70, ATCC VR-1251, ATCC VR-622 and ATCC VR-1252; and coronavirus, for 
example ATCC VR-740 and those described in Hamre (1966) Proc Soc Exp Biol Med 121:190. 
Delivery of the compositions of this invention into cells is not limited to the above mentioned viral vectors. 
Other delivery methods and media may be employed such as, for example, nucleic acid expression vectors, 
polycationic condensed DNA linked or unlinked to killed adenovirus alone, for example see US Serial No. 

25 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 3:147-154 ligand linked DNA, for 
example see Wu (1989) J Biol Chem 264:16985-16987, eucaryotic cell delivery vehicles cells, for example 
see US Serial No.08/240,030, filed May 9, 1994, and US Serial No. 08/404,796, deposition of 
photopolymerized hydrogel materials, hand-held gene transfer particle gun, as described in US Patent 
5,149,655, ionizing radiation as described in US5,206,152 and in WO92/11033, nucleic charge 

30 neutralization or fusion with cell membranes. Additional approaches are described in Philip (1994) Mol Cell 
Biol 14:2411-2418 and in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. Briefly, the 
sequence can be inserted into conventional vectors that contain conventional control sequences for high 
level expression, and then incubated with synthetic gene transfer molecules such as polymeric 
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DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting ligands such as 
asialoorosomucoid, as described in Wu & Wu (1987) J. Biol Chem. 262:4429-4432, insulin as described in 
Hucked (1990) Biochem Pharmacol 40:253-263, galactose as described in Plank (1992) Bioconjugate Chem 
3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in WO 
90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex beads. DNA 
coated latex beads are efficiently transported into cells after endocytosis initiation by the beads. The method 
may be improved further by treatment of the beads to increase hydrophobicity and thereby facilitate 
disruption of the endosome and release of the DNA into the cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US ,5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral delivery, the 
nucleic acid sequences encoding a polypeptide can be inserted into conventional vectors that contain 
conventional control sequences for high level expression, and then be incubated with synthetic gene transfer 
molecules such as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell 
targeting ligands such as asialoorosomucoid, insulin, galactose, lactose, or transferrin. Other delivery 
systems include the use of liposomes to encapsulate DNA comprising the gene under the control of a 
variety of tissue-specific or ubiquitously-active promoters. Further non-viral delivery suitable for use 
includes mechanical delivery systems such as the approach described in Woffendin et al (1994) Proc. Natl 
Acad. Set USA 91(24):1 1581-1 1585. Moreover, the coding sequence and the product of expression of such 
can be delivered through deposition of photopolymerized hydrogel materials. Other conventional methods 
for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for activating 
transferred gene, as described in US 5,206,152 and W092/1 1033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 and 
4,762,915; in WO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, Biochemistry, 
pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem Biophys Acta 600:1; Bayer 
(1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 149:119; Wang (1987) Proc Natl 
AcadSci 84:7851; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy vehicle, as 
the term is defined above. For purposes of the present invention, an effective dose will be from about 0.01 
mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is 
administered. 



75 



WO 2004/018646 



PCT/US2003/026827 



Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly to the 
subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression of recombinant 
proteins. The subjects to be treated can be mammals or birds. Also, human subjects can be treated. 

5 Direct delivery of the compositions will generally be accomplished by injection, either subcutaneously, 
intraperitoneally, intravenously or intramuscularly or delivered to the interstitial space of a tissue. The 
compositions can also be administered into a lesion. Other modes of administration include oral and 
pulmonary administration, suppositories, and transdermal or transcutaneous applications (eg. see 
WO98/20734), needles, and gene guns or hyposprays. Dosage treatment may be a single dose schedule or a 

10 multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known in the art 
and described in eg. W093/14778. Examples of cells useful in ex vivo applications include, for example, 
stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic cells, or tumor cells. 
Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished by the 

15 following procedures, for example, dextran-mediated transfection, calcium phosphate precipitation, 
polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of the polynucleotide(s) 
in liposomes, and direct microinjection of the DNA into nuclei, all well known in the art. 
Polynucleotide and volvpeptide pharmaceutical compositions 
The terms "polynucleotide" and "nucleic acid", used interchangeably herein, 

20 In addition to the pharmaceutical^ acceptable carriers and salts described above, the following additional 
agents can be used with polynucleotide and/or polypeptide compositions. 

A. Polvpeptides 

One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); transferrin; 
asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, granulocyte, 
25 macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating factor (G-CSF), 
macrophage colony stimulating factor (M-CSF), stem cell factor and erythropoietin. Viral antigens, such as 
envelope proteins, can also be used. Also, proteins from other invasive organisms, such as the 17 amino 
acid peptide from the circumsporozoite protein of Plasmodium falciparum known as R1I. 

B. Hormones, Vitamins, etc. 

30 Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, thyroid 
hormone, or vitamins, folic acid. 

C. Polvalkvlenes. Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a preferred 
embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or polysaccharides 
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can be included. In a preferred embodiment of this aspect, the polysaccharide is dextran or DEAE-dextran. 
Also, chitosan and poly(lactide-co-glycolide) 
D .Lipids, and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes prior to 

5 delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or entrap and 
retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary but will generally be 
around 1:1 (mg DNA:micromoles lipid), or more of lipid. For a review of the use of liposomes as carriers 
for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. Biophys. Acta, 1097:1-17; Straubinger 

10 (1983) Meth Enzymol 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), anionic 
(negatively charged) and neutral preparations. Cationic liposomes have been shown to mediate intracellular 
delivery of plasmid DNA (Feigner (1987) Proc. Natl Acad. Sci USA 84:7413-7416); mRNA (Malone 
(1989) Proc. Natl Acad. Sci. USA 86:6077-6081); and purified transcription factors (Debs (1990) J. Biol 

15 Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, 

N[l-2,3-dioleyloxy)propyl]-N,N,N-triethylammonium (DOTMA) liposomes are available under the 
trademark Lipofectin, from GIBCO BRL, Grand Island, NY. (See, also, Feigner supra). Other 
commercially available liposomes include transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). 

20 Other cationic liposomes can be prepared from readily available materials using techniques well known in 
the art. See, eg. Szoka (1978) Proc. Natl Acad. Sci. USA 75:4194-4198; WO90/11092 for a description of 
the synthesis of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 
Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials include 

25 phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. These 
materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate ratios. Methods 
for making liposomes using these materials are well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), or large 
30 unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared using methods 
known in the art. See eg. Straubinger (1983) Meth. Immunol 101:512-527; Szoka (1978) Proc. Natl Acad. 
Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 394:483; Wilson (1979) Cell 
17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; Ostro (1977) Biochem. Biophys. Res. 
Commun. 76:836; Fraley (1979) Proc. Natl Acad. Sci. USA 76:3348); Enoch & Strittmatter (1979) Proc. 
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Natl Acad. Sci. USA 76:145; Fraley (1980) J. Biol Chem. (1980) 255:10431; Szoka & Papahadjopoulos 

(1978) Proc. Natl Acad Sci USA 75:145; and Schaefer-Ridder (1982) Science 215:166. 
EXipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. Examples of 
5 lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, fragments, or 
fusions of these proteins can also be used. Also, modifications of naturally occurring lipoproteins can be 
used, such as acetylated LDL. These lipoproteins can target the delivery of polynucleotides to cells 
expressing lipoprotein receptors. Preferably, if lipoproteins are including with the polynucleotide to be 
delivered, no other targeting ligand is included in the composition. 

10 Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are known as 
apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and identified. At least two of 
these contain several proteins, designated by Roman numerals, AI, All, AIV; CI, CII, CIII. 
A lipoprotein can comprise more than one apoprotein. For example, naturally occurring chylomicrons 
comprises of A, B, C & E, over time these lipoproteins lose A and acquire C & E. VLDL comprises A, B, C 

15 & E apoproteins, LDL comprises apoprotein B; and HDL comprises apoproteins A, C, & E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) Annu 
Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 261:12918; 
Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 
Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 

20 phospholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of naturally 
occurring lipoproteins can be found, for example, in Meth. Enzymol. 128 (1986). The composition of the 
lipids are chosen to aid in conformation of the apoprotein for receptor binding activity. The composition of 
lipids can also be chosen to facilitate hydrophobic interaction and association with the polynucleotide 

25 binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. Such 
methods are described in Metk Enzymol (supra); Pitas (1980) J. Biochem. 255:5454-5460 and Mahey 

(1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or recombinant methods by 
expression of the apoprotein genes in a desired host cell. See, for example, Atkinson (1986) Annu Rev 

30 Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 443. Lipoproteins can also be 
purchased from commercial suppliers, such as Biomedical Techniologies, Inc., Stoughton, Massachusetts, 
USA. Further description of lipoproteins can be found in Zuckermann et al. PCT/US97/14465. 
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F.Polvcationic Agents 

Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are capable of 
neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired location. These agents 
have both in vitro, ex vivo, and in vivo applications. Polycationic agents can be used to deliver nucleic acids 
to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyornithine, and protamine. Other examples include histones, protamines, human serum albumin, DNA 
binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, such as (X174, 
transcriptional factors also contain domains that bind DNA and therefore may be useful as nucleic aid 
condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, AP-1, AP-2, AP-3, CPF, 
Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIID contain basic domains that bind DNA sequences. 
Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the list 
above, to construct other polypeptide polycationic agents or to produce synthetic polycationic agents. 
Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when combined with 
polynucleotides/polypeptides. 
Immunodiamostic Assays 

Streptococcus antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Streptococcus antibodies can be used to detect antigen levels). Immunoassays based on 
well defined, recombinant antigens can be developed to replace invasive diagnostics methods. Antibodies to 
Streptococcus proteins within biological samples, including for example, blood or serum samples, can be 
detected. Design of the immunoassays is subject to a great deal of variation, and a variety of these are 
known in the art. Protocols for the immunoassay may be based, for example, upon competition, or direct 
reaction, or sandwich type assays. Protocols may also, for example, use solid supports, or may be by 
immunoprecipitation. Most assays involve the use of labeled antibody or polypeptide; the labels may be, for 
example, fluorescent, chemiluminescent, radioactive, or dye molecules. Assays which amplify the signals 
from the probe are also known; examples of which are assays which utilize biotin and avidin, and enzyme- 
labeled and mediated immunoassays, such as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed by 
packaging the appropriate materials, including the compositions of the invention, in suitable containers, 
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along with the remaining reagents and materials (for example, suitable buffers, salt solutions, etc.) required 
for the conduct of the assay, as well as suitable set of assay instructions. 
Use of Polypeptides to Screen for Pevtide Analogs and Antagonists 

Polypeptides encoded by the instant polynucleotides and corresponding full length genes can be used to 
screen peptide libraries to identify binding partners, such as receptors, from within the library. Peptide 
libraries can be synthesized according to methods known in the art (e.g. Us patent 5,010,175; 
W091/17823). Agonists or antagonists of the polypeptides if the invention can be screened using any 
available method known in the art, such as signal transduction, antibody binding, receptor binding, 
mitogenic assays, chemotaxis assays, etc. The assay conditions ideally should resemble the conditions under 
which the native activity is exhibited in vivo, that is, under physiologic pH, temperature, and ionic strength. 
Suitable agonists or antagonists will exhibit strong inhibition or enhancement of the native activity at 
concentrations that do not cause toxic side effects in the subject. Agonists or antagonists that compete for 
binding to the native polypeptide can require concentrations equal to or greater than the native 
concentration, while inhibitors capable of binding irreversibly to the polypeptide can be added in 
concentrations on the order of the native concentration. 

Such screening and experimentation can lead to identification of a polypeptide binding partner, such as a 
receptor, encoded by a gene or a cDNA corresponding to a polynucleotide described herein, and at least one 
peptide agonist or antagonist of the binding partner. Such agonists and antagonists can be used to modulate, 
enhance, or inhibit receptor function in cells to which the receptor is native, or in cells that possess the 
receptor as a result of genetic engineering. Further, if the receptor shares biologically important 
characteristics with a known receptor, information about agonist/antagonist binding can facilitate 
development of improved agonists/antagonists of the known receptor. 
Identification of anti-bacterial agents 
Drug Screening Assays 

Of particular interest in the present invention is the identification of agents that have activity in modulating 
expression of one or more of the adhesion-specific genes described herein, so as to inhibit infection and/or 
disease. Of particular interest are screening assays for agents that have a low toxicity for human cells. 
The term "agent" as used herein describes any molecule with the capability of altering or mimicking the 
expression or physiological function of a gene product of a differentially expressed gene. Generally a 
plurality of assay mixtures are run in parallel with different agent concentrations to obtain a differential 
response to the various concentrations. Typically, one of these concentrations serves as a negative control 
i.e. at zero concentration or below the level of detection. 

Candidate agents encompass numerous chemical classes, including, but not limited to, organic molecules 
(e.g. small organic compounds having a molecular weight of more than 50 and less than about 2,500 
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daltons), peptides, antisense polynucleotides, and ribozymes, and the like. Candidate agents can comprise 
functional groups necessary for structural interaction with proteins, particularly hydrogen bonding, and 
typically include at least an amine, carbonyl, hydroxyl or carboxyl group, preferably at least two of the 
functional chemical groups. The candidate agents often comprise cyclical carbon or heterocyclic structures 
and/or aromatic or polyaromatic structures substituted with one or more of the above functional groups. 
Candidate agents are also found among biomolecules including, but not limited to: polynucleotides, 
peptides, saccharides, fatty acids, steroids, purines, pyrimidines, derivatives, structural analogs or 
combinations thereof. 

Candidate agents are obtained from a wide variety of sources including libraries of synthetic or natural 
compounds. For example, numerous means are available for random and directed synthesis of a wide 
variety of organic compounds and biomolecules, including expression of randomized oligonucleotides and 
oligopeptides. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and 
animal extracts are available or readily produced. Additionally, natural or synthetically produced libraries 
and compounds are readily modified through conventional chemical, physical and biochemical means, and 
may be used to produce combinatorial libraries. Known pharmacological agents may be subjected to 
directed or random chemical modifications, such as acylation, alkylation, esterification, amidification, etc. 
to produce structural analogs. 
Screening of Candidate Agents In Vitro 

A wide variety of in vitro assays may be used to screen candidate agents for the desired biological activity, 
including, but not limited to, labeled in vitro protein-protein binding assays, protein-DNA binding assays 
{e.g. to identify agents that affect expression), electrophoretic mobility shift assays, immunoassays for 
protein binding, and the like. For example, by providing for the production of large amounts of a 
differentially expressed polypeptide, one can identify ligands or substrates that bind to, modulate or mimic 
the action of the polypeptide. The purified polypeptide may also be used for determination of three- 
dimensional crystal structure, which can be used for modeling intermolecular interactions, transcriptional 
regulation, etc. 

The screening assay can be a binding assay, wherein one or more of the molecules may be joined to a label, 
and the label directly or indirectly provide a detectable signal. Various labels include radioisotopes, 
fluorescers, chemiluminescers, enzymes, specific binding molecules, particles, e.g. magnetic particles, and 
the like. Specific binding molecules include pairs, such as biotin and streptavidin, digoxin and antidigoxin 
etc. For the specific binding members, the complementary member would normally be labeled with a 
molecule that provides for detection, in accordance with known procedures. 

A variety of other reagents may be included in the screening assays described herein. Where the assay is a 
binding assay, these include reagents like salts, neutral proteins, e.g. albumin, detergents, etc. that are used 
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to facilitate optimal protein-protein binding, protein-DNA binding, and/or reduce non-specific or 
background interactions. Reagents that improve the efficiency of the assay, such as protease inhibitors, 
nuclease inhibitors, anti-microbial agents, etc. may be used. The mixture of components are added in any 
order that provides for the requisite binding. Incubations are performed at any suitable temperature, 
5 typically between 4 and 40°C. Incubation periods are selected for optimum activity, but may also be 
optimized to facilitate rapid high-throughput screening. Typically between 0.1 and 1 hours will be 
sufficient. 

Many mammalian genes have homologs in yeast and lower animals. The study of such homologs* 
physiological role and interactions with other proteins in vivo or in vitro can facilitate understanding of 
10 biological function. In addition to model systems based on genetic complementation, yeast has been shown 
to be a powerful tool for studying protein-protein interactions through the two hybrid system. 
Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen bonding. 
Typically, one sequence will be fixed to a solid support and the other will be free in solution. Then, the two 

15 sequences will be placed in contact with one another under conditions that favor hydrogen bonding. Factors 
that affect this bonding include: the type and volume of solvent; reaction temperature; time of hybridization; 
agitation; agents to block the non-specific attachment of the liquid phase sequence to the solid support 
(Denhardt's reagent or BLOTTO); concentration of the sequences; use of compounds to increase the rate of 
association of sequences (dextran sulfate or polyethylene glycol); and the stringency of the washing 

20 conditions following hybridization. See Sambrook et al [supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar sequences 
over sequences that differ. For example, the combination of temperature and salt concentration should be 
chosen that is approximately 120 to 200°C below the calculated Tm of the hybrid under study. The 
temperature and salt conditions can often be determined empirically in preliminary experiments in which 

25 samples of genomic DNA immobilized on filters are hybridized to the sequence of interest and then washed 
under conditions of different stringencies. See Sambrook et al at page 9.50. 

Variables to consider when performing, for example, a Southern blot are (1) the complexity of the DNA 
being blotted and (2) the homology between the probe and the sequences being detected. The total amount 
of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to lfxg for a plasmid or phage digest 
30 to 10" 9 to 10~ 8 g for a single copy gene in a highly complex eukaryotic genome. For lower complexity 
polynucleotides, substantially shorter blotting, hybridization, and exposure times, a smaller amount of 
starting polynucleotides, and lower specific activity of probes can be used. For example, a single-copy yeast 
gene can be detected with an exposure time of only 1 hour starting with 1 |xg of yeast DNA, blotting for two 
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hours, and hybridizing for 4-8 hours with a probe of 10 cpm/jug. For a single-copy mammalian gene a 
conservative approach would start with 10 \ig of DNA, blot overnight, and hybridize overnight in the 
presence of 10% dextran sulfate using a probe of greater than 10 8 cpm/|ug, resulting in an exposure time of 
~24 hours. 

5 Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe and the 
fragment of interest, and consequently, the appropriate conditions for hybridization and washing. In many 
cases the probe is not 100% homologous to the fragment. Other commonly encountered variables include 
the length and total G+C content of the hybridizing sequences and the ionic strength and formamide content 
of the hybridization buffer. The effects of all of these factors can be approximated by a single equation: 
10 Tm- 81 + 16.6(logi 0 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs (slightly 
modified from Meinkoth & Wahl (1984) Anal Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
conveniently altered. The temperature of the hybridization and washes and the salt concentration during the 

15 washes are the simplest to adjust. As the temperature of the hybridization increases (ie, stringency), it 
becomes less likely for hybridization to occur between strands that are nonhomologous, and as a result, 
background decreases. If the radiolabeled probe is not completely homologous with the immobilized 
fragment (as is frequently the case in gene family and interspecies hybridization experiments), the 
hybridization temperature must be reduced, and background will increase. The temperature of the washes 

20 affects the intensity of the hybridizing band and the degree of background in a similar manner. The 
stringency of the washes is also increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% formamide are 42°C for a probe 
with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, and 32°C for 
85% to 90% homology. For lower homologies, formamide content should be lowered and temperature 

25 adjusted accordingly, using the equation above. If the homology between the probe and the target fragment 
are not known, the simplest approach is to start with both hybridization and wash conditions which are 
nonstringent. If non-specific bands or high background are observed after autoradiography, the filter can be 
washed at high stringency and reexposed. If the time required for exposure makes this approach impractical, 
several hybridization and/or washing stringencies should be tested in parallel. 

30 Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid probes 
according to the invention can determine the presence of cDNA or mRNA. A probe is said to "hybridize" 
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with a sequence of the invention if it can form a duplex or double stranded complex, which is stable enough 
to be detected. 

The nucleic acid probes will hybridize to the Streptococcus nucleotide sequences of the invention (including 
both sense and antisense strands). Though many different nucleotide sequences will encode the amino acid 
5 sequence, the native Streptococcal sequence is preferred because it is the actual sequence present in cells. 
mRNA represents a coding sequence and so a probe should be complementary to the coding sequence; 
single-stranded cDNA is complementary to mRNA, and so a cDNA probe should be complementary to the 
non-coding sequence. 

The probe sequence need not be identical to the Streptococcal sequence (or its complement) — some 
10 variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe can 
form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can include 
additional nucleotides to stabilize the formed duplex. Additional Streptococcus sequence may also be 
helpful as a label to detect the formed duplex. For example, a non-complementary nucleotide sequence may 
be attached to the 5 ! end of the probe, with the remainder of the probe sequence being complementary to a 
15 Streptococcus sequence. Alternatively, non-complementary bases or longer sequences can be interspersed 
into the probe, provided that the probe sequence has sufficient complementarity with the a Streptococcus 
sequence in order to hybridize therewith and thereby form a duplex which can be detected. 
The exact length and sequence of the probe will depend on the hybridization conditions {e.g. temperature, 
salt condition etc.). For example, for diagnostic applications, depending on the complexity of the analyte 
20 sequence, the nucleic acid probe typically contains at least 10-20 nucleotides, preferably 15-25, and more 
preferably at least 30 nucleotides, although it may be shorter than this. Short primers generally require 
cooler temperatures to form sufficiently stable hybrid complexes with the template. 

Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al. [J. Am. 
Chem. Soc. (1981) 103:3185], or according to Urdea et al [Proc. Natl Acad. Set USA (1983) 80: 7461], or 

25 using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, DNA or 
RNA are appropriate. For other applications, modifications may be incorporated eg. backbone 
modifications, such as phosphorothioates or methylphosphonates, can be used to increase in vivo half-life, 
alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer (1995) Curr Opin Biotechnol 

30 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as peptide nucleic acids may also be used 
[eg. see Corey (1997) TIBTECH 15:224-229; Buchardt^a/. (1993) TIBTECH 11:3 84-3 86] . 
Alternatively, the polymerase chain reaction (PGR) is another well-known means for detecting small 
amounts of target nucleic acid. The assay is described in Mullis et al. [Meth. Enzymol (1987) 155:335-350] 
& US patents 4,683,195 & 4,683,202. Two "primer" nucleotides hybridize with the target nucleic acids and 
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are used to prime the reaction. The primers can comprise sequence that does not hybridize to the sequence 
of the amplification target (or its complement) to aid with duplex stability or, for example, to incorporate a 
convenient restriction site. Typically, such sequence will flank the desired Streptococcus sequence. 
A thermostable polymerase creates copies of target nucleic acids from the primers using the original target 
5 nucleic acids as a template. After a threshold amount of target nucleic acids are generated by the 
polymerase, they can be detected by more traditional methods, such as Southern blots. When using the 
Southern blot method, the labelled probe will hybridize to the Streptococcus sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook et 
al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified and 
10 separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid support, such 
as nitrocellulose. The solid support is exposed to a labelled probe and then washed to remove any 
unhybridized probe. Next, the duplexes containing the labeled probe are detected. Typically, the probe is 
labelled with a radioactive moiety. 
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SEQUENCE LISTING 



SEQ ID NO. 1301: SA60466 FROM THE 2603V/R GBS STRAIN 

CTCCTGCCCCTG C AAT GGC AGT T AGAC C C AT AGGT T TAT T T T TAT AT T T T AAT G C CT G C AT AAG AT GAAG GAT AT T AAT AAT T C CT 
GAGCAGGCATAAGGGTGTCCGTAAGCTAATGTCCCTCCAAAAATATTGAATTTTTCTCTCTCTTCAGGATAATAATGATTAAATAG 
AGCAT C AAT CG C T G C AAAT GGT T CAT T C CAT T C AAT T G CAT CAT AAT C CG AT AT T T T AGT AT G AGT T T C T GT T AAT AGT T T T T C CG 
TAGCCGTGTGAACCAATTCTGGACTAAGCTTGGGATCTCCTGCTACTTCTACAATGTGAACAATCCGGAATTCTGTTTTCTGACTC 
TGAAGCGTTAGAAATGCAGCAGCATCGTGCATTAAACAAACATTTCCAATAGTGAGCAAAGGTGAATTTTCCATCAATCTTGGTAA 
TTTTTGAAAAAATGTTtCTTTTaGTTTTCTAACGCCTTGATCTCGCATCCCTTCCATTGGTAAGATTACyTCTTCTAAATAGCCAC 
C T T GT T T AGC T GT T AAGGC G C GT T T AT GG C T C AAG AAT G C C AAT T TAT C T AAC AT TTCTCTTC T AAAa C CAT AT T T T T GAC AGACT 
CTCTGGGCCCCTTCTAACATTACAGTTTCAGCATAAGAGTCAGGAGAAAACTGAGCAACTGTATATTCTCCGTTACGATTATCTTC 
TTTAGCATAACGTCTCATAGGTTGAAGAGAACTACTTTCAATCCCCCCAACAA.GAACTTTTTCATTAATACCGGTACTGATTTTTA 
GAT AAC C AAAAAAC AAGG C AG AAC T T GAT GAAG C AC ACT G CAT AT C AAT C GT T T GT AC T G G AAT AT AGGAT T CAT AAT C AG AAAAA 
AGAGTCATCAAACGACCAATATTGCCCCCAGTACCAACTGTGTTCCCACAAATAATACTATCAATGTTAGATTCTGATTCTATTTT 
TTTTATTTGATTT2\AAAGGTGTGCTCCTAAAAGTTCTGGACGGTAAGTTTAAA.TTGCTT 

SEQ ID NO. 1302: SAG0466 FROM THE M732 GBS TYPE III STRAIN 

TCGGTATAA^GGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATAGAATCA 
GAATCTAATATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTA 
TGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTG 
CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGT 
AACGGAGAA.TATACCGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGAAGGGGCACAAAGAGTCTGTCAAAR. 
ATATGGTTTTAGAAGAGAAATGTTAGATAAATTGGCATTCTTGAGCCATAAACGCGCCTTAACAGCTAAACAAGGTGGCTATTTAG 
AAG AG GT AAT CT T AC C AAT G GAAGG G AT G CG AGAT C AAGG C GT T AG AAAAC T AAAAG AAG CAT T T T T T C AAAAAT T AC C AAG AT T G 
ATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAAC 
AGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTAT 
TAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGATGCTTTATTTAATCAT 
TATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTA 

SEQ ID NO. 1303: SAG0466 FROM THE 090 GBS TYPE la STRAIN 

TTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTATGAATCCTATATTCCAGTACAAA 
CGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTGCCGGTATTAATGAAAAAGTTCTT 
GTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGTAACGGAGAATATACCGTTGCTCA 
GTTTTCTCCTGACTCTTAkGCTGAAACTGTAATGtTAGAAGGGGCACAAAGAGTCTGTCAAAAATATGGTTTtAGAAGAGAAATGT 
TAGATAAATTGGCATTCTTGAGCCATAAACGCGCCTTAACAGCTAAACAAGGTGGCTATTTAGAAGAGGTAATCTTACCAATGGAA 
GGGATGCGAGATCAAGGCGTTAGAAAACTAAAAGAAGCATTTTTTCAAAAATTACCAAGATTGATGGrAAATTCACCTTTGCTCAC 
TATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTwACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTG 
TAG AAG TAG C AGGAGAT C C C AAG CT T AGT C C AGAAT T G GT T C AC AC G GCT AC GG AAAAACT AT T AAC AG AAAC T C AT AC T AAAAT A 
T C GG AT TAT GAT GC AAT T G AAT GG AAT G AAC CAT T T GC AG C GAT T GAT GCT T T AT T T AAT CAT TAT TAT C CT GAAGAGAG AGAAAA 
ATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGG 

SEQ ID NO. 1304: SAG0466 FROM THE COH1 GBS TYPE la STRAIN 

ATCGGTATAAAAGGGAAGCAATTTAAAATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATAGAATCA 
GAATCTAATATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTA 
TGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGGTATCTAAAAA 

SEQ ID NO. 1305 : SAG04 66 FROM THE CJB GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

TTTTCAAAAATTACCAAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTC 
TAACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTT 
CACACGGCTACGGAAAAACTATTAACAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGC 
GATTGATGCTTTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTT 
AATGCCTGCTCAGGAATTATTAATATCC 

SEQ ID NO. 1306: sag04 66 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

GGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTCTTAAATCAAATAAAAAAAATATAACCAGA 
ATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTATG 
AATCCTATATTC 

SEQ ID NO. 1307: SAG04 66 FROM THE 1169NT1 GBS TYPE V STRAIN REVERSE COMPLEMENT 

CAAGATTGATGGAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGT 
CAGAAAACAGAATTCCGGATTGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGA 
AAAAC TAT T AAC AG AAAC T CAT AC T AAAAT AT C G GAT TAT GAT G C AAT T G AAT G G AAT G AAC CAT T T G C AG C GAT T GAT G CT TT AT 
TTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGA 
AT T AT T AAT AT C CT T CAT C T TAT G C AG G C AT T AAAAT AT AAAAAT AAA.C C T AT GG G C C T AACT G C CAT T G C AGGG G C A 

SEQ ID NO. 1308: SAG0466 FROM THE 18RS21 GBS TYPE II STRAIN 
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SEQUENCE LISTING 



CCTTAACAGTTAAACAAGGTGGCTATTTAGAAGAGGTAATCTTACCAATGGAAGGGATGCGAGATCAAGGCGTTAGAAAACTAAAA 
G AAAC AT T T T T T C AAAAAT T AC C AAG AT T GAT G GAAAAT T C AC CT T T GC T C AC TAT T G G AAAT GT T T GT T T AAT GC ACGAT GCT G C 
T GCAT T T CT AACG CT T C AGAGT C AG AAAAC AG AAT T C CG G AT T GT T C AC AT T GT AGAAGT AG C AGG AG AT CCC AAGC T T AGT C C AG 
AAT T GGT T C AC ACG G CT AC GGAAAAACT AT T AAC AGAAACT C AT ACT AAAAT AT C G GAT T ATG AT G C AAT T GAAT GGAAT GAAC C A 
TTTGCAGCGATTGATGCTCTATTTAATCATTATTATCCTGAAGAGAGAGAAAAATTCAATATTTTTGGAGGGACATTAGCTTACGG 
ACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCTTATGCAGGCATTAAAATATAAAAATAAACCTATGGGTCTAACTG 
CCATTGCAGGGGCAG 

SEQ ID NO. 1309: SAG0466 FROM THE 18RS21 GBS TYPE II STRAIN 

TCGGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTTTTAAATCAAATAAAAAAAATAGAATCA 
GAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTA 
TGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTA 
CCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTATGCTAAAGAAGATAATCGT ' 
AACGGAGAATATACAGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGAAGGGGCCCAGAGAGTCTGTCAAAA 
AT AT GGT T T TAG AAG AG AAAT GT TAG AT AAAT T G G CAT T CT T GAG C CAT AAAC G CG CC T T AAC AG C T AAAC A 

SEQ ID NO. 1310: SAG0466 PROM THE H36b GBS TYPE lb STRAIN 

TTTGGGCTACGAACACCTATCGGTATAAAAGGGAAGCAATTTAAACATTACCGTCCAGAACTTTTAGGAGCACACCTTTTAAATCA 
AATAAAAAAAATAGAATCAGAATCTAACATTGATAGTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGA 
TGACTCTTTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTT 
GGTTATCTAAAAATCAGTACCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTA 
TGCTAAAGAAGATAATCGTAACGGAGAATATACAGTTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGAAGGGG 
CCC 

SEQ ID NO. 1311: SAG04 66 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

GAAAATTCACCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGA 
AT T C C GG AT T G T T C AC AT T GT AG AAGT AG C AGG AG AT C C C AAGCT T AG T C C AG AAT T G GT T C AC AC GG C T AC G G AAAAAC T AT T AA 
CAGAAACTCATACTAAAATATCGGATTATGATGCAATTGAATGGAATGAACCATTTGCAGCGATTGATGCTCTATTTAATCATTAT 
TAT C C T GAAGAG AG AG AAAAAT T C AAT AT T T T T G GAG GG AC AT T AGC T T AC G G AC AC C CT TAT G C CT G C T C AG GAAT TAT T AAT AT 
CCTTCATCTTATGCAGGCATTAAAATATAAAAATAAACCTATGGGTCTAACTGCCATTGCAGGGGCAGGA 

SEQ ID NO. 1312: SAG0466 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

CCTTTGCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATTCCGGAT 
T GT T C AC AT T GT AG AAGT AGC AGG AG AT C C C AAG CT T AG T C C AG AAT T GGT T C AC ACG G C T AC GG AAAAAC TAT T AAC AG AAAC T C 
AT AC T AAAAT AT C G GAT TAT GAT G C AAT T GAAT G GAAT GAAC CAT T T G C AG C GAT T GAT GCT T T AT T T AAT CAT TAT TAT C C T G AA 
GAGAGAGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCT 
TATGCAGGCATTAAAATATAAAAATAAACCTATGGGTTCTAACTGC 

SEQ ID NO. 1313: SAG0466 FROM THE M781 GBS TYPE III STRAIN 

G C AAT T T AAAC AT T ACCGT C C AG AACT TT T AG G AGC AC ACC T C T T AAAT C AAAT AAAAAAAAT AG AAT C AGAAT CT AAT AT T GAT A 
GTATTATTTGTGGGAACACAGTTGGTACTGGGGGCAATATTGGTCGTTTGATGACTCTTTTTTCTGATTATGAATCCTATATTCCA 
GTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTTGGTTATCTAAAAATCAGTGCCGGTATTAATGAAAA 
AGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTACGCTAAAGAAGATAATCGTAACGGAGAATATACCG 
TTGCTCAGTTTTCTCCTGACTCTTATGCTGAAACTGTAATGTTAGA 

SEQ ID NO 1314: SAGO 4 66 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

CCTTTGCT C ACT AT T GG AAAT GT T T GT T T AAT G C ACG AT G C T G C T G CAT T T C T AAC GC T T C AG AGT C AG AAAAC AG AAT T C C G GAT 
TGTTCACATTGTAGAAGTAGCAGGAGATCCCAAGCTTAGTCCAGAATTGGTTCACACGGCTACGGAAAAACTATTAACAGAAACTC 
AT ACT AAAAT AT C G GAT TAT GAT GC AAT T GAAT GGAAT GAAC CAT T T G C AG C GAT T GAT GCT CT AT T T AAT CAT TAT TAT C C T G AA 
GAG AG AG AAAAAT T C AAT AT T T T T G GAG G GAC AT T AG CT T AC G G AC AC C C T TAT G C CT G CT C AGG AAT TAT T AAT AT C CT T CAT C T 
TATGCAGGCATTAAAATATAAAAATAAACCTATGGGTCTAACTGCCATTGCAGGGGC 

SEQ ID NO. 1315: SAG0466 FROM THE JM9130013 GBS TYPE VIII STRAIN REVERSE COMPLEMENT 

GCTCACTATTGGAAATGTTTGTTTAATGCACGATGCTGCTGCATTTCTAACGCTTCAGAGTCAGAAAACAGAATTCCGGATTGTTC 
ACAT TGT AGAAGT AGC AGGAGAT CCC AAGCTT AGT CC AGAAT TGGTTC AC ACGGCTACGGAAAAACT AT TAACAGAAACT CAT ACT 
AAAAT AT C GG AT T AT GAT G C AAT T GAAT GGAAT GAAC CAT T T GC AG C GAT T GAT G C T C T AT T T AAT CAT TAT TAT C C T GAAGAG AG 
AGAAAAATTCAATATTTTTGGAGGGGCATTAGCTTACGGACACCCTTATGCCTGCTCAGGAATTATTAATATCCTTCATCTTATGC 
AGGC ATT AAAAT AT AAAAAT AAACCT AT GGGTCTAACTGCC ATT GCAGGGGC AGG A 

SEQ ID NO. 1316: SAG0466 FROM THE JM9130013 GBS TYPE VIII STRAIN 

T T T G G GC T AC GAAC AC C TAT C G GT AT AAAAG GG AAG C AAT T T AAAC AT T AC C G T C C AG AACT T T T AGG AG C AC AC CT T T T AAAT C A 
AAT AAAAAAAAT AG AAT C AG AAT C T AAC AT T GAT AGT AT TAT T T G T G GG AAC AC AGT T G G T AC T G GG GG C AAT AT TGGTCGTTT GA 
TGACTCTTTTTTCTGATTATGAATCCTATATTCCAGTACAAACGATTGATATGCAGTGTGCTTCATCAAGTTCTGCCTTGTTTTTT 
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GGTTATCTAAAAATCAGTACCGGTATTAATGAAAAAGTTCTTGTTGGGGGGATTGAAAGTAGTTCTCTTCAACCTATGAGACGTTA 
T G CT AAAGAAGAT AAT C GT AACGG AGAAT AT A 

SEQ ID NO. 1401: SAG0471 PROM THE 18RS21 GBS TYPE II STRAIN 

TTAAATT TGGTATCTTGACGCTTGAGGGAGAAGTACAAGAAAAATGGGCAATT GAGAC CAATACT TTAGAAAACGGAAGACATAT C 
GTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTC 
T C C AGG AG CT GT T G AT AGAAC T AGT AAAAC AG T AAC AGGT GC T T T T AAT C T AAAT T GG G C T GAT AC T C AAG AAGT AGGT T CAGT T A 
TTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCC 
AATAATCCCGACGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTGC 
AGGAGCAGGTGGAGAAATTGGGCATATGATTGTTGATCCAGAAAATGGATTTACGTGCACATGTGGTAACAAAGGCTGCCTTGAGA 
CAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATGAGGGTTCGTCTGCCATTAAAGCAGCGATT 
GACACCGGTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGT 
ATCACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAG 
CAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTAAAATTAAGAT 

SEQ ID NO. 1402: SAG0471 FROM THE 090 GBS TYPE la STRAIN 

CGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTT 
CTCCAGGAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTT 
ATTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGC 
CAATAATCCCGATGTTGTTTTCGTAACCCTCGGAACAGGAGTAGGTGGAGGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTG 
CAGGAGCAGGTGGAGAAATTGGGCATATGATTGTTGATCCAGAKAATGGATTTACGTGCACATGTGGTAACAAAGGCTGTCTTGAG 
ACAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAACAATATGAAGGTTCGTCTGCCATTAAAGCAGCGAT 
TGACAACGGTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTG 
TATCACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCA 
G C AGGT GAAT T T T T AC G T AGT CG C G T T GAG AAAT AC T T T GT C AC AT TT G 

SEQ ID NO. 1403: SAG0471 FROM THE COH1 GBS TYPE la STRAIN 

ACAAGAAAAATGGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTT 
TGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTGTTGATAG7UVCTAGTAAAACAGTA 
AC AG GT G CT T T T AAT CT AAAT T G GG CT GAT AC T C AAG A 

SEQ ID NO. 1404: SAG0471 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

TT GGT AT C T T G AC G CT T GAG G AG AAGT AC AAG AAAAAT G GG C AAT T GAGAC C AAT ACT T TAG AAAAC G GAAGAC AT AT CGTTTCTG 
ATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGGTCTCCAGGA 
G CTGT T GAT AGAACT AGT AAAAC 

SEQ ID NO. 1405: SAG0471 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

CACCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGT 
CGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTA 

SEQ ID NO. 1406: SAG0471 FROM THE 2603V/R GBS TYPE V STRAIN 

GGGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTAT 
GGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTG 

SEQ ID NO. 1407: SAG0471 FROM THE H36b GBS TYPE lb STRAIN 

GGCAATTGAGACCAATACTTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATG 
G AT T AAC AAAAG AT G AC T T T C T C GGT AT C GG T AT GGGTTCTC C AGG AGC T GT T GAT AGAACT AG T AAAAC AGT AAC AGGT GCT T T T 
AATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAA 
TGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGACGTTGTTTTCGTAACC 

SEQ ID NO. 1408: SAG0471 FROM THE H36 GBS TYPE lb STRAIN (REVERSE COMPLEMENT ) 

GAG AC AGT T GC AT C AG C GAC AG GT GT T G T TAG AGT AG C AC G T C AAC T C G C AG AAC AAT AT GAG GGTTCGTCTGC CAT T AAAG C AG C 
GAT T G AC AAC G GT G AT AC T GT T AC AAGT AAAG AT AT T T T TAT AG C AG C AG AAG AT GGG GAT AAAT T T G C T AAT TCTGTTGTT G AAC 
GTGTATCACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCA 
GCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACA 

SEQ ID NO. 1409: SAG0471 FROM THE M732 GBS TYPE III STRAIN 

ACAAGAAAAATGGGCAATTGAGACCATACTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATCGTTTG 
AG C C T CT AT GG AT T AAC AAAAG AT G ACT T T C T C GGT AT C G GT AT GGGTTCTC C AG G AG CT G T T G AT AG AACT AG T AAAAC AGT AAC 
AGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGTTGGAATTCCATTTTTTATTGATA 
ACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGATGTTGTTTTCGTAACCCTCGGAACA 
GGAGTAGGTGGAGGTGTTATCGCAGATGGTAACCTCATCCATGGTGTTGCAAGAGCAGGTGGAGAAATTGGGCATATGATT 

SEQ ID NO. 1410: SAG0471 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 
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CAGCAGCAGGTGAATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAAGTCAACTAAAATT 
AAGATTGCTGAACTAGGTAATGAT 

SEQ ID NO. 1411: SAG0471 FROM THE M781 GBS TYPE XII STRAIN 

AGAAGTACAAGAAAATGGGCAATTGAGACCATACTTAGAAAACGGAAGACATATCGTTTCTGATATCGTTGAATCTCTCAAACATC 
GTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGCTGTTGATAGAACTAGTAAAACA 
GTAACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCGGTTATTGAAAAAGAAGTTGGAATTCCATTTTTTAT 
TGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAATCCCGATGTTGTTTTCGTAACCCTCG 

GAACAGGAGTA 

SEQ ID NO. 1412: SAG0471 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTA 
CCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAAT 
TTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTTTCCCACAAGTTAAAAA 

SEQ ID NO. 1413: SAG0471 FROM THE 090 GBS TYPE la STRAIN 

AAATTTGGTATCTTGACGCTTGAGGGAGAAGTACAAGAAAAATGGGCATTGAGACCATACTTAGAAAACGGAAGACATATCGTTTC 
TGATATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAG 
GAGCTGTTGATAGAACTAGTAAAACAGTAACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAA 
AAAGAAGTTGGAATTCCATTTTTTATTGATAACGATGCTAATGTTGCAGCACTTGGTGAACGCTGGGTAGGTGCTGGTGCCAATAA 
T C C CG AC GT T GT T T T C GT AAC C CT C G G AAC AG G AGT AG GT GG AG G 

SEQ ID NO. 1414: SAG0471 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

GTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGATGGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGT 
TACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGA 
AT T T T T ACGT AGT CG C GT T G AG AAAT ACT T T AT C AC AT TTGCTTTCC C AC AAGT T AAAAAG T C AAC T AAAAT T AAGAT T G 

SEQ ID NO. 1415: SAG0471 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

GTTATCGCAGATGGTAACCTCATCCATGGTGTTGCAGGAGCAGGTGGAGAAATTGGGCATATGATTGTTGATCCAGAAAATGGATT 
TACGTGCACATGTGGTAACAAAGGCTGCCTTGAGACAGTTGCATCAGCGACAGGTGTTGTTAGAGTAGCACGTCAACTCGCAGAAC 
AATATGAGGGTTCGTCTGCCATTAAAiSCAGCGATTGACCACGGTGATACTGTTACAAGTAAAGATATTTTTATAGCAGCAGAAGAT 
GGGGATAAATTTGCTAATTCTGTTGTTGAACGTGTATCACGTTACCTTGGACTGGCAGCAGCTAATATTTCAAATATTTTAAACCC 
TGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGT^ATTTTTACGTAGTCGCGTTGAGAAATACTTTGTCACATTTGCTT 
T C CC AC AAGTT AAAAAGT C AACT AA 

SEQ ID NO. 1416: SAG0471 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

TGGTATCTTGACGCTTGAGGGAGAAGTACAAGAAAAATGGGCAATTGAGACCATACTTAGAAAACGGAAGACATATCGTTTCTGAT 
ATCGTTGAATCTCTCAAACATCGTTTGAGCCTCTATGGATTAACAAAAGATGACTTTCTCGGTATCGGTATGGGTTCTCCAGGAGC 
TGTTGATAGAACTAGTT^AAACAGTCACAGGTGCTTTTAATCTAAATTGGGCTGATACTCAAGAAGTAGGTTCAGTTATTGAAAAAG 

AAGCTGGAATTCCATTTTTTATTG 

SEQ ID NO. 1417: SAG0471 FROM THE 2603V/R TYPE V GBS STRAIN (REVERSE COMPLEMENT) 

AGCAGCTAATATTTCAAATATTTTAAACCCTGATTCTGTGGTTATTGGTGGCGGTGTCTCAGCAGCAGGTGAATTTTTACGTAGTC 

GCGTTGAGAAATACTTTGTCACATTTGTTTTCCCACAAGGT 

SEQ ID NO. 1501: SAG0492 FROM THE 1169NT1 GBS NONT Y PE ABLE STRAIN 

T G AC T T G GAT AT T CAT C AAGG AG AAGT GGT GG T TAT TAT TGGCCCTTCTGGCTCT G GT AAGT C AAC AT T T T T AAG AAC AAT G AAT C 
T C T T G GAAGT AC C AAC AAAGGGAAC AGT G ACT T T T G AAGGAAT T G AT AT AAC AG AC AAAAAAAAT GAT AT T T T T AAAAT G C GC G AA 
AAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAA 
GGGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAG 
CTAGCTTATCTGGAGGACAACAACAACGGATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCT 
ACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGT 
CACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGCATTATTGTGAGCAAGGGACCCCTAA 

G GAAGT AT 

SEQ ID NO. 1502: SAG0492 FROM THE 18RS21 GBS TYPE II STRAIN 

TTGGGAAAAATGAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGT 
AAG T C AAC AT T T T T AAG AAC AAT G AAT CT C T T G GAAGT AC C AAC AAAG G G AAC AG T G AC T T T T G AAGGG AT T GAT AT AAC AG AC AA 
AAAG AAT GAT AT T T T T AAAAT G CG CG AAAAAAT GGG CAT G G T T T T T C AAC AGT T C AAT CT AT T T C C C AAT AT G ACT GT AC TAG AAA 
ATATTACTTTATCACCTATTAAGACAAAGGGGCTTTCTAATCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAAAAGTTGGA 
CTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAA 
TCCTCATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAG 
CTAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGACGCA 
G AAAT TAT 
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SEQ ID NO. 1503: SAG0492 FROM THE 2603V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

AAAAATGAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTC 
AAC ATTTTT AAGAACAATGAAT CT CTT GGAAGT ACCAACAAAGGGAACAGT GACTTTTGAAGGG AT TGAT ATAACAGACAAAAAGA 
ATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATT 
ACTTTATCACCTATTAAGACAAAGGGGCTTTCTAATCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAAAAGTTGGACTCAA 
AG AGAAGGCT AAT AC T TAT C GAG CT AG CT TAT C T GG AG G AC AACAAC AACGAAT T G CT AT T G C AAG AG G T CT T G C AAT G AAT C C T G 
ATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAA 
TCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGAAT 
TAT T GT T GAG C AAGGGG C C C 

SEQ ID NO. 1504: SAG0492 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATT 
T T T AAG AAC AAT GAAT CT C T T GGAAGT AC C AAC AAAG GG AAC AG T GAC T T T T G AAG GG AT T GAT AT AAC AG AC AAAAAGAAT GAT A 
TTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTCAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTA 
T C AC C TAT T AAG AC AAAGGG AC T T T C T AAG CTT GAT G C T C AG AC AAAAG C AT ACG AG CT AC T T G AAAAAGT T G G ACT C AAAG AG AA 
GGCTAATGCTTATCCAGCAAGCTTATCTGGAGGACAACAACAACGGATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCC 
TTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTG7VAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGT 
ATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGGATTATTGT 
TGAGCAAGGGACCCCTAAGAAAGTAT 

SEQ ID NO. 1505: SAG0492 FROM THE 090 GBS TYPE la STRAIN 

TGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACA 
GTGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTT 
CAATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAAGGGACTTTCTAAGCTTGATGCTCAGA 
CAAAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCTAGCTTATCTGGAGGGC AACAAC AA 
CGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGT 
AGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTG 
AAGTAGCGGATCGTGTCATTTTTATGGATGCAGGCATTATTGTTgAsCAAGGGACCCCTAAGGAAGTA 

SEQ ID NO. 1506: SAG0492 FROM THE A909 GBS TYPE la STRAIN 

CAATACAAGGACTTCATAAZ\AGTTTTGGGAAAAATGAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTT 
ATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTT 
TG AAGGGATTG AT AT AAC AGAC AAAAAGAAT GAT AT TTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTC AAC AGTTC AAT CT AT 
TTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAAGGGGCTTTCTAAGCTTGATGCTCAGACAAAAGCA 
TATGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGC 
TATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAG 
TCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCG 
GATCGTGTCATTTTTATGGATGCAGGAATTATTGTgAGCAAGGGGCCCCTAAGGAAGTATTTGAGCAGACAAAAGAAATCCGCACA 
AG AG AT T T CT T 

SEQ ID NO. 1507: SAG0492 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAATCT 
CT T G G AAGT AC C AAC AAAGG G AAC AGT G ACT T T T G AAGGG ATT GAT AT AAC AG AC AAAAAGAAT GAT AT T T T T AAAAT G C G CG AAA 
AAAT G GG CAT GG T T T T T C AAC AGT T C AAT CT AT T T C C C AAT AT GAC T GT AC T AG AAAAT AT T AC T T TAT C AC CT AT T AAG AC AAAG 
GGACTTTCTAAGCTTGATGCTCAGACAAAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGC 
TAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTTCTTTTTGATGAACCTA 
CTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTATGACGATGGTTATTGTC 
ACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCTTTTTATGGATGCGGGAATTATTGTGAGCAAGGGACC 

SEQ ID NO. 1508: SAGO 4 92 FROM THE H36b GBS TYPE lb STRAIN 

ATGAGGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACA 
T TTTT AAGAACAATGAAT CT CTT GGAAGT AC C AAC AAAGGG AAC AGTGACTT T T GAAGGGATT GAT AT AAC AGAC AAAAAGAAT GA 
TAT T T T T AAAAT G CG CG AAAAAAT GGG C AT GG T T T T T C AAC AG T T C AAT C T AT T T C C C AAT AT G ACT GT AC TAG AAAAT AT T AC T T 
TAT C AC C T AT T AAG AC AAAG G GG CT T T CT AAG CTT GAT G CT C AG AC AAAAG CAT AT G AGC T ACT T G AAAAAGT T G GAC T C AAAG AG 
AAGGCTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGT 
CCTTCTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTG 
GTATGACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCASGAATTATT 
GTTGAGCAAGGGGCCCCTAAGGAAGTAT 

SEQ ID NO. 1509: SAG0492 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

GGTTTTAAAAGGCATTGACTTGGATATTCATCAAGGAGAAGTAGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTT 
T AAG AAC AAT GAAT C T CT T GGAAGT AC C AAC AAAG G G AAC AG T GAC TTTT G AAG GG AT T GAT AT AAC AG AC AAAAAGAAT GAT AT T 
TT T AAAAT GCGCG AAAAAAT GGGC AT GGTTTTTCAACAGTTC AAT CT ATT TCCC AAT AT G ACT GT ACT AG AAAAT AT TACTTT AT C 
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ACCTATTAAGACAAAGGGGCTTTCTAAGCTTGATGCTCAGACAAAAGCATATGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGG 
CTAATACTTATCCAGCTAGCTTATCTGGAGGACAACAACAACGAATTGCTATTGCAAGAGGTCTTGCAATGAATCCTGATGTCCTT 
CTTTTTGATGAACCTACTTCAGCTCTTGATCCTGAAATGGTAGGTGAAGTCTTGACTGTTATGCAAGATTTAGCTAAATCTGGTAT 
GACGATGGTTATTGTCACTCATGAAATGGGTTTTGCACGTGAAGTAGCGGATCGTGTCATTTTTATGGATGCAGGAATTATTGTTG 
AGC AAGGGG C C CC T AAG G AAGT AT T TAG C AAAAC AAAAGAAAT 

SEQ ID NO. 1510: SA60492 FROM THE M732 GBS TYPE III STRAIN 

GGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAATCTCTTGGAAGTACCAACAAAGGGAACAG 
TGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCGAAAAAATGGGCATGGTTTTTCAACAGTTC 
AATCTATTTCCCAATATGACTGTACTAGAAAATATTACTTTATCACCTATTAAGACAAAGGGACTTTCTAAGCTTGATGCTCAGAC 
AAAAGCATACGAGCTACTTGAAAAAGTTGGACTCAAAGAGAAGGCTAATGCTTATCCAGCAAGCTTATCTGG 

SEQ ID NO. 1511: SAG0492 FROM THE COH1 GBS TYPE la STRAIN 

ATTGACTTGGATATTCATCAAGGAGAAGTGGTGGTTATTATTGGCCCTTCTGGCTCTGGTAAGTCAACATTTTTAAGAACAATGAA 
TCTCTTGGAAGTACCAACAAAGGGAACAGTGACTTTTGAAGGGATTGATATAACAGACAAAAAGAATGATATTTTTAAAATGCGCG 
AAAAAAT GG G C AT GGT T T T T C AAC AGT T C AAT C T AT TT C C C AAT AT G ACT GT ACT AG AAAAT AT TACT T T AT C AC CT AT T AAG AC A 
AAGG G AC T T T C T AAG CT T GAT G C T C AG AC AAAAG C AT ACGAGC T AC T T GAAAAAGT T GGACT C AAAGAGAAG GCT AAT G CT T AT C C 
AGCAAGCTTATCTGG 

SEQ ID NO. 1601: SAG0767 FROM THE M781 GBS TYPE III STRAIN 

TGGTCGCTCTGTCGGAACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAA 
AC T TAT T T TAT C ACG C AAGT AG GT C AAT T T AT T AAAAC AC AAG AAT T T GAT GAAAT G CC AT C T T C AG AT G AAAAG T T AAT GAC AAA 
C C AAACT GT T GAT T TAG AC AAAAT GGT T CGT C C AAGT GAT AT C T AT GAT GAT AAT G CAAT TGTTTTCCCCGTT T T AC AT GGAC C AA 
TGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTT CAAGCGTGGCT 
AT GG AT AAAAT T AC AAC AAAAC AAGT C C T T G C AAC AGT AG GT GT AC C T C AGG T T GC AT AT C AAAC T TAT T T T GAG GGT GAT GAT T T 
GGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTGTAAAACCGGCTAATATGGGGTCATCAGTAGGTATTT 
CAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCGTG 
ACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGTTGTTAAAGACGTCGATTT 
CTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCAATGC 
GTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATC 
TTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCTCTGCTTTGGGAAAATATGGGGCTAACTTATAG 
TGATTTGATTG 

SEQ ID NO. 1602: SAG0767 FROM THE 090 GBS TYPE la STRAIN 

AAACCGGGCATTGTATTCAGTTCGTTTAAGAAGACTTGTCCATCTTTCGTCAAAAAGAAATCACAGCGTGATAAACCACAAGCCCC 
GAT T GCT T T AAAAG C T T T ACT T GC AT AT TG ACG C AT T G C T T C C AT AGT T GC T T CAT C AACT T TAG C T GG AAT AT C CAT AGT AAT T T 
TATTATCAATATATTTGGCGTCATAGTCATAGAAATCGACGTCTTTAACGACTTCGCCAGGAAAAGTTGTCTTAACATCATTATTG 
CCTAAAATACCTACTTCAATTTCACGAGCTGTCACGCCTTGTTCAATCAAAATACGGCTATCATACTTGAGAGCTAAGTCAATksC 
AGAGCGAAGTGAGGATTCATCTGTCGCTTTTGAAATACCTACTGATGACCCCATATTAGCCGGTTTTACAAAAATTGGGAAACTTA 
AAGT TTCTAAAGAGAGTTT AAT CGCATGTTCCAAAT CATC AC CCTC AAAAT AAGTTTGATATGCAACCTGAGGTACACCTACTGTT 
GCAAGGACTTGTTTTGTTGTAATTTTATCCATAGCCACGCTTGAAGATAGAATATTAGTCCCAACATAAGGCATCCTTAAAACTTC 
T AAAAAT C CT T GGAT AG AAC CAT CT T C C C C CAT T GGT C CAT GT AAAACGGG GAAAAC AAT T GC AT TAT CAT C AT AGAT AT C AC T T G 

GACGAACCATTTTGTCTT^AATCAACAGTTTGGTTTGTCATTAACTTTTCATCTGAAGATGGCATTTCATCAAATTCTTGTGTTTTA 
AT AAATT GAC CTACTT GCGTG 

SEQ ID NO. 1603: SAG0767 FROM THE COH1 TYPE la STRAIN 

TCGCTCTGCGG7\ACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTT 
ATTTTATCACGCAAGTAGGTCAATTTATTAAAACACAAGAATTTGATGAAATGCCATCTTCAGATGAAAAGTTAATGACAAACCAA 
ACT GT TG AT T TAG AC AAAAT GGTTCGTC C AAGTG AT AT C T AT GAT GAT AAT G CAAT TGTTTTCCCCGTTT TAG AT G GAC CAAT G G G 
GGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAAGCGTGGCTAT 

SEQ ID NO. 1604: SAG0767 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

CGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAAGCAACTATGG 
AAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGAT 
GG AC AAAT C T T C T T AAAC GAAC T G AAT AC AAT G C C C 

SEQ ID NO. 1605: SAG0767 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

AACGTGAAGTATCTGTACTGCTCTGCAGAAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCA 
CG C AAGT AGGT CAAT T TAT T AAAAC AC AAG AAT T T GAT GAAAT G C CAT C T T C AG AT G AAAA 

SEQ ID NO. 1606: SAGO 7 67 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

CTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGAT 
AG C C GT AT T T T GAT T GAAC AAGG C GT GAC AG C T C G T GAAAT T G AAGT AG GT AT T T T AGG CAAT AAT GAT GT T AAG AC AACT T T T C C 

TGGCGAAGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAG 
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TTGATGAAGCAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGAT 
TTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCTCTGCT 
TTGGGAAAAT 

SEQ ID NO. 1607: SAG0767 PROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

TTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAAT 
AATGATGTTAAGACAACTTTTCCTGGCGT^AGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAAT 
TACT AT G GAT AT T C C AG CT AAAGT T GAT GAAG C AACT AT GG AAG C AAT G CG T C AAT AT GC AAGT AAAGC T T T T AAAGC AAT CGG GG 
CTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACT 
CAGTGGTCAATGTATCCCCTGCTTTGGGAAAAGTATGGGGCTAACCTT 

SEQ ID NO. 1608: SAG0767 FROM THE 18RS21 GBS TYPE II STRAIN 

ATCTGTACTGTCTGCAGAAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAGGT 
C AAT T T AT T AAAAC AC AAGAAT T T GAT G AAAT G C CAT C T T C AGAT G AAAAG T T AAT GAG AAAC C AAAC T G T T GAT T T AGAC AAAAT 
GGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAG 
GATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAA 

SEQ ID NO. 1609: SAG07 67 FROM THE 2603V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GGCTATGGATAAAATTACAACAAAACAAGTCCTTGCAACAGTAGGTGTACCTCAGGTTGCATATCAAACTTATTTTGAGGGTGATG 
ATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTGTAAAACCGGCTAATATGGGGTCATCAGTAGGT 
AT T T C AAAAG C GAC AGAT GAAT C CT C ACT TCGCTCTG C AAT T G ACT T AGC T C T C AAGT AT G AT AGC C GT AT T T T GAT T G AAC AAGG 
CGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGTCGTTAAAGACGTCG 
ATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCCAGCTAAAGTTGATGAAGCAACTATGGAAGCA 
ATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGAATGGAC 
AAAT C T T C T T AAAC G AACT GAAAT AC 

SEQ ID NO. 1610: SAG0767 FROM THE 2603V/R GBS TYPE V STRAIN 

TCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGAT7VAATTTTTTGTTAAAACTTATTTTATCACGCAAGTAGGTCA 
AT T T AT T AAAAC AC AAG AAT T T GAT GAAAT GC CAT C T T C AG AT G AAAAGT T AAT GAC AAAC C AAACT GT T GAT T TAG AC AAAAT GG 
TTCGTCCAAGTGATATCTATGATGATAAT 

SEQ ID NO. 1611: SAGO 7 67 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

AAAACCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCA 
AGTATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACA 
ACTTTTCCTGGCGAAGTCGTTAAAGACGTCGATTTCTATGACTATGACGCCAAATATATTGATAATAAAATTACTATGGATATTCC 
AGCTAAAGTTGATGAAGCAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCAC 
GCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTAT 
CCCCTGCTTTGGGAAAATATGGGGCTAACTTATAG 

SEQ ID NO. 1612: SAG0767 FROM THE H36b TYPE lb STRAIN 

CGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAACTTATTTTATCACGCA 
AGT AGGT C AAT T TAT T AAAAC AC AAG AAT T TGAT GAAAT GC CAT CT T C AG AT G AAAAGT T AAT GAC AAAC C AAACT GTT GAT T TAG 
ACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAATGGGGGAAGATGGTTCT 
ATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAAGCGTGGCTATGGATAAAATTACAAC 
AAAACAAGT C CT T G C AAC AGT AG 

SEQ ID NO. 1613: SAG0767 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

ATGCGATTAAACTCTCTTTAGAACCTTTAAGTTTCCCAATTTTTGTAAACCCGGCTAATATGGGGTCATCAGTAGGTATTTCAAAA 
GCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATTTTGATTGAACAAGGCGTGACAGC 
TCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGTTGTTAAAGACGTCGATTTCTATG 
ACT ATGACGCC AAAT AT AT TGAT AAT AAAAT T ACT ATGG AT ATT CCAGCTAAAGTTGATG AAG C AACT ATGGAAGC AAT GCGTCAA 
TATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTGACGAAAGATGGACAAATCTTCTT 
AAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCTCTGCTTTGGGAAAATATGGGGCTAACTT 

SEQ ID NO. 1614: SAG0767 FROM THE M732 GBS TYPE III STRAIN 

GT CAT GCCGTGCT ATT AATT ATGAT AAATTTTT TGTT AAAACTT AT TTT AT CACGCAAGTAGGT CAATTT AT TAAAACACAAG AAT 
T T GAT GAAAT G C CAT CT T CAGAT GAAAAG T T AAT GACAAAC C AAAC T GT T GAT TTAGACAAAATGGTTCGTCC AAGT GAT AT C TAT 
GATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAATGGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGAT 
G C CT T AT GT T G GG AC T AAT AT T C T AT CT T C AAG C GT GGC T AT GG AT AAAAT T AC AAC AAAAC AAGT C C T T G C AAC AGT AG GT GT AC 
CTCAGG 

SEQ ID NO. 1615: SAG0767 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

TTTTGAGGGTGATGATTTGGAACATGCGATTAAACTCTCTTTAGAAACTTTAAGTTTCCCAATTTTTGTAAAACCGGCTAATATGG 
GGTCATCAGTAGGTATTTCAAAAGCGACAGATGAATCCTCACTTCGCTCTGCAATTGACTTAGCTCTCAAGTATGATAGCCGTATT 
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TTGATTGAACAAGGCGTGACAGCTCGTGAAATTGAAGTAGGTATTTTAGGCAATAATGATGTTAAGACAACTTTTCCTGGCGAAGT 
CGT T AAAG ACGT CG AT T T CT AT GACT AT G AC G C C AAAT AT AT T GAT AAT AAAAT T ACT ATG G AT AT T CC AGCT AAAGT T GAT GAAG 
CAACTATGGAAGCAATGCGTCAATATGCAAGTAAAGCTTTTAAAGCAATCGGGGCTTGTGGTTTATCACGCTGTGATTTCTTTTTG 
ACGAAAGATGGACAAATCTTCTTAAACGAACTGAATACAATGCCCGGTTTTACTCAGTGGTCAATGTATCCCCTGCTTTGGGAAAA 
TATGGGGCTAACTTATAGTGA 

SEQ ID NO. 1616: SAG0767 FROM THE A909 GBS TYPE la STRAIN 

TGGTCGCTCTGCGGAACGTGAAGTATCTGTACTGTCTGCAGAAAGCGTCATGCGTGCTATTAATTATGATAAATTTTTTGTTAAAA 
CT T AT T T TAT C AC GC AAGT AGGT C AAT T T AT T AAAAC AC AAG AAT T T GAT G AAAT GC CAT C T T C AG AT G AAAAGT T AAT G AC AAAC 
CAAACTGTTGATTTAGACAAAATGGTTCGTCCAAGTGATATCTATGATGATAATGCAATTGTTTTCCCCGTTTTACATGGACCAAT 
GGGGGAAGATGGTTCTATCCAAGGATTTTTAGAAGTTTTAAGGATGCCTTATGTTGGGACTAATATTCTATCTTCAAGCGTGGCTA 
TGG AT AAAATT AC AAC AAAAC AAGT CCTTGCAACAGTAGG 

SEQ ID NO. 1617: SAG0767 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

AAG C AGGGGAT AC AT T GAC C ACT GAGT AAAAC C GGG CAT T GT AT T C AGT T C G T T T AAGAAGAT C T GT C CAT C T T T CGT C AAAAAG A 
AATCACAGCGTGATAAACCACAAGCCCCGATTGCTTTAAAAGCTTTACTTGCATATTGACGCATTGCTTCCATAGATGCTTCATCA 
ACTTTAGCTGGAATATCCATAGCAATTTTATTATCAATATATTTGGCG 

SEQ ID NO. 1701: SAG1086 FROM THE1169NT1 GBS NONTYPEABLE STRAIN 

T T T AAAG GT T GAT TCCTTTTT GACT CAT C AGGT AGAT T T T G AGT T AAT GCAGGAAAT AG GT AAAGT T T T T G CT GAT AAAT AT AAAG 
AAGC C GG C AT T AC G AAGGT T GT T AC GAT T GAAGC AT C T GG AAT T GCG C C AGC AGT GT AC GC AG C T C AAG CAT T GGGC GT ACC AAT G 
ATATTTGCTAAAAAGGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGWTACGAG 
TCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTA 
AAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGT 
GAT T T GT TAG AAAAAAC AGGT GT T C C AGT 

SEQ ID NO. 1702: SAG0767 FROM THE 18RS21 GBS TYPE II STRAIN 

TTTAGGTGAG AAC ATTTTAAAGGTTGATTCTTTTTT GACT CATC AGGT AGATTTTGAGTTAATGCAGG AAAT AGGT AAAGTTTTTG 
C T GAT AAAT AT AAAG AAG C CGG CAT T AC GAAG GT T G T T AC GAT T GAAG CAT CT G G AAT T GC AC C AG C AG T G T AC GC AG C T C AAG C A 
TTGGGCGkACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTAC 
AAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAA 
ACGGTCAAGCGGCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCT 
T T C C AAGAT GG G C GT GAT T T GT TAG AAAAAAC A 

SEQ ID NO. 1703: SAG0767 FROM THE H36bl GBS TYPE lb STRAIN 

AAGAACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTTTTTTGACTCATCAGGTAGATTTTGAG 
TTAATGCAGG AAAT AGGT AAAG TTTTTGCT GAT AAAT AT AAAGAAGCCGGC AT TACGAAGGTTGTT AC AATTG AAGC AT CTGG AAT 
TGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTA7VAAAAGCTAAGAACATTACTATGACTGAAGGTA 
TCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACT 
GTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGC 
T GGT AT C G G AAT C YT TAT T G AAAAAT CT T T C C AAGAT GGG CGT G ATT 

SEQ ID NO. 1704: SAG0767 FROM THE M732 GBS TYPE III STRAIN 

AT T C TTT TTT G ACT AT C AG GT AAATTTTG AGT T AAT GCAGGAAATAGGTAAAGTTTTTGCT GAT AAAT AT AAAGAAGCCGGC ATT A 
CGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAA 
AAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTAT 
TGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTG 
AAATTATTGGTCAAGCTGAAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTGTTAGAA 
AAAACAGGTGTTCCGGTTACTTCTCTTGCTCGT 

SEQ ID NO. 1705: SAG0767 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GAACGTATTCTTAAAGATGGTGATGTTTTAGGTGAGAACATTTTAAAAGTTGATTCTTTTTTGACTCATCAGGTAAATTTTGAGTT 
AATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTG 
CGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATR 
TTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGT 
AC T CAT CAT T GAT GAC T T T T T AAC AAAC GGT C AAG C 

SEQ ID NO. 1706: SAG0767 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

ACATTTTAAAGGTTGATTCTTTTTTGACTCATCAGGTAGATTTTGAGTTAATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATAT 
AAAG AAG C C GG C AT T AC G AAG GT T GT T AC GAT T GAAGC AT C T G G AAT T G C AC C AG C AGT G T ACG C AG C T C AAG CAT T GGG CGT AC C 
AATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTA 
CGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACMGTCYAGCG 
GCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGG 
GCGTGATTTGTTAGAAAA 
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SEQ ID NO. 1707: SAG0767 PROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

AC GT AT T C T T AAAGAT G GT GAT G T T T T AGGT GAG AAC AT T T T AAAAGT T GAT TCTTTTTT GACT CAT C AG GT AG AT T T T GAGT T AA 
TGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCG 
CCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTT 
AACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTAC 
T CAT CAT T GAT G ACT T T T T AG CAAAC GGKC AAG CGG S T AAAG GAT TACT T GAAAT TAT T G GT C AAG C T G GAG CT A 

SEQ ID NO. 1708: SAG0767 FROM THE COHl GBS TYPE la STRAIN 

TTTAAAAGTTGATTCTTTTTT GACT CAT CAGGTAAATTTTGAGTTAATGCAGGAAAT AGGT AAAGTTTTTGCTGATAAAT AT AAAG 
AAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATG 
ATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAG 
TCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTA 
AAGGAT.TACTTGAAATTATTGGTCAAGCTGAAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGT 
GATTTGTTAGAAAAAACAGGTGTTCCGGTTAC 

SEQ ID NO. 1709: SAG0767 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGC 
ATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCTTAACTGCTGAAGTGTATTCTTTTA 
CAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTACTCATCATTGATGACTTTTTAGCA 
AACGGTCAAGCGGCTAAAGGATTACTTGAAATTTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGAATCGTTATTGAAAAAT 
CTTTCCAAGATGGGCGTGATTTGTTAGAAAAAACAGGTGTTCCAGT 

SEQ ID NO. 1710: SAG0767 FROM THE 2603 V/R GBS TYPE V STRAIN 

AACGT ATT CT T AAAGATGGT G ATGTTTT AGGTG AGAACATTTT AAAAGT TG ATT CTTTTTTG ACT CATC AGGT AG AT TTTGAGTT A 
ATGCAGGAAATAGGTAAAGTTTTTGCTGATAAATATAAAGAAGCCGGCATTACGAAGGTTGTTACAATTGAAGCATCTGGAATTGC 
GCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAAAAAAGCTAAGAACATTACTATGACTGAAGGTATCT 
TAACTGCTGAAGTGTATTCTTTTACAAAGCAAGTTACGAGTCAAGTTTCTATTGTGAGTCGCTTTTTATCTAACGATGATACTGTA 
CTCATCATTGATGACTTTTTAGCAAACGGTCAAGCGGCTAAAGGATTACTTGAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGG 
TATCGGAATCGTTATTGAAAAATCTTTCCAAGATGGGCGTGATTTGTTAGAAAAAACAGGTGTTCCAG 

SEQ ID NO. 1711: SAG0767 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

ACGAAGGTTGTTACAATTGAAGCATCTGGAATTGCGCCAGCAGTGTACGCAGCTCAAGCATTGGGCGTACCAATGATATTTGCTAA 
AAAAG C T AAG AAC AT TACT AT GACT G AAG G T AT CT T AAC T GC T GAAGT GT AT T CT T TT AC AAAG C AAGT T AC GAGT C AAGT T T C T A 
T T GT GAG T C G CT T T T T AT CT AAC GAT GAT AC T G T ACT CAT CAT T GAT G AC T T T T TAG CAAAC GGT C AAG C G G CT AAAGGAT T AC T T 
GAAATTATTGGTCAAGCTGGAGCTAAGGTTGCTGGTATCGGA 

SEQ ID NO. 1801: SAG1600 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

AATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAATT 
T C T TAT T GACT AAAAAT GT T AAGAT GAT T GT T AT AG C T T GT AAT AC AG C AACT G C AGT T G C CT G G C AAGAAAT T AAAG AAAAACT A 
GACGTGCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCAACTAATTCAGGGAAAGTTGGTATTATAGGTAC 
TCCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATACTGCTGTGGTATCCCTTGCTTGTCCGA 
AATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTTAGCCAAAAAGGTGGTTTATGAAACGTTGTCCCCATTAGTTGGT 
AAATTAGATACTTTAATTTTAGGTTGCACGCATTATCCCTTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGAGGTTAAATT 
AATTGATAGTGGCGCAGAAACCGTTCGTGATATTTCTGTTTTATTGAACTATTTTGAGATAAACCATAATTGGCAAAATAAACACG 
G T GGT CAT C AC T T T T AC AC AAC C G C C AG C C C AA 

SEQ ID NO. 1802: SAG1600 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

AAATGTTCCGTCAACTTCCAGAAGAGGAAGTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAG 
ATTAGAGAGTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGC 
AGT T G C C T GG C AAG AAAT T AAAGAAAAAC T AG AC AT C C CT GT T T T AGG CGT TAT T T T AC C AGG AG CT AG C GC AG CT AT C AAAT C AA 
CTAATTTAGGGAAAGTTGGTATTATAGGTACTCCCATGACTGTT7\AATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCA 
AATACTGCTGTGGTATCCCTTGCTTGTCCGAAATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTTAGCCAAAAAGGT 
GGTTTATGAAACGTTGTCCCCATTAGTTGGTAAATTAGATACTTTAATTTTAGGTTGCACGCATTATCCCCTATTACGTCCCATCA 
TTCAAAATGTTATGGGGGCTGAGGTTAAATTAATTGATAGTGGCGCAGAAACCGTTCGTGATATTTCTGTTTTATTGAACTATTTT 
G AG AT AAAC CAT AAT T GG C AAAAT AAAC ACGGT GGT CAT C AC T T T T AC AC AAC CG C C AG C C C AAAAGGT T T T AAAG AAA 

SEQ ID NO. 1803: SAG1600 FROM THE 090 GBS TYPE la STRAIN 

AAT C T T CAT T G G AGAC C AG G C TAG AG C T C C GT AT GGT C CT AG AC C T G CT C AAC AG AT TAG AG AG T t AC CT GG C AGAT GGT T AAT T T 
CTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGCCTGGCAAGAAATTAAAGAAAAACTAG 
ACATACCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCAACTAATTCAGGGAAAGTTGGTATTATAGGTACT 
CCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATACTGCTGTGGTATCCCTTGCTTGTCCGAA 
ATTTGTTCCAATTGTGGAATCAAATCAGATGTCTTCTAGTTTAGCCAAAAAGGTGGTTTATGAAACGCTGTCCCCATTAGTTGGTA 
AATTAGATACTTTAATTTTAGGTTGCACGCATTATCCCTTATTACGTCCCATCATTCAAAATGTTATGGGGGCTGAGGTTAAATTA 
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AT T GAT AG T GG C G C AGAAAC CGT T CGT GAT AT TTCTGTTT TAT T GAACT AT T T T GAGAT a AmC CAT a AT T GG s mAAAT AAAC AC G G 
TGGTCATCACTTTTACACAACCGsCAGCCCAAAAGGTTTTTAAGGAAATTGCAGAACAATGGCTTAATCAAGAAATAAAT 

SEQ ID NO. 1804: SAG1600 FROM THE A909 GBS TYPE la STRAIN 

GCGGTTGTGTAAAAGTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATAGTTCAATAAAACAGAAATATCACG 
AACGGTTTCTGCGCCACTATCAATTAATTTAACCTCAGCCCCCATAACATTTTGAATGATGGGACGTAATAGGGGATAATGCGTGC 
AACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTTTTTGGCTAAACTAGAAGACATCTGA 
T TT GAT T CCACAATT GGAACAAATT T CGGACAAGCAAGGGATACCACAGCAGT ATTT GGAGACAAAGCT T GAATTT T T T GACGAT A 
AGCATCTGATTTAACAGTCATGGGAGTACCTATAATACCAACTTTCCCTAAATTAGTTGATTTGATAGCTGCGCTAGCTCCTGGTA 
AAATAACGCCTAAAACAGGGATGTCTAGTTTTTCTTTAATTTCTTGCCAGGCAACTGCAGTTGCTGTATTACAAGCTATAACAATC 
ATCTTAACATTTTTAGTCAATAAGAAGTTAACCATCTGCCAGGTAAACTCTCTAATCTGTTGAGCAGGTCTAGGACCATACGGAGC 
TCTAGCCT GAT CTCCAATGAAGATTACTTCCTCTTCTGGAAGTTGACGGAAC ATTT CCTTAACAACCGTTAAACCACCT 

SEQ ID NO. 1805: SA61600 FROM THE COHl GBS TYPE la STRAIN 

TTCCGTCAACTTCCAAAATATGAAGTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAG 
AGAGTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTG 
CCTGGCAAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCAACTAAT 
TTAGGGAAAGTTGGTATTATAGGTACTCCCATGACTGTTAAATCAGATGCTTATCGTCAAAAAATTCAAGCTTTGTCTCCAAATAC 
TGCTGTGGTATCCCTTGCTTGTCCGAAAT 

SEQ ID NO. 1806: SAG1600 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

GTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAA 
TTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGCAGTTGCCTGGCAAGAAATTAAAGAAAAAC 
TAGACATAC 

SEQ ID NO. 1807: SAG1600 FROM THE 1169NT1 GBS TYPE V STRAIN 

CTTTTGGGCTGGCGGTTGTGTAAAATTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATAGTTCAATAAAACA 
G AAAT AT C ACG AACGGT T T C T G C G C C AC T AT C AAT T AAT T T AAC C T C AG C C C CC AT AAC AT T T T G AAT AAT GG G AC GT AAT AGGGG 
ATAATGCGTGCAACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAATGTTTCATAAACCACCTTTTTGGCTAAACTAG 
AAGAC AT CT G AT T T GAT T C C AC AAT T G G AAC AAAT T T C GG AC AAGC AAG G GAT AC C AC AGC AGT AT T T G G AG AC AAAG CT T G AAT T 
T T T T G AC G AT AAG CAT C T GAT T T AAC AGT CAT G G G AGT AC CT AT AA 

SEQ ID NO. 1808: SAG1600 FROM THE 1169NT1 GBS TYPE V STRAIN 

GTAATCTTCATTGGGGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACAGATTAGAGAGTTTACCTGGCAGATGGTTAA 
T T T C T TAT T G AC T AAAAAT GT T AAGAT GAT T GT TAT AG C T T GT AAT AC AGC AACT GC AG T T 

SEQ ID NO. 1809: SAG1600 FROM THE 18RS21 GBS TYPE II STRAIN 

GAAATGTTCCGTCAACTTCCAGAAGAGGAAGTAATCTTCATTGGAGATCAGGCTAGAGCTCCGTATGGTCCTAGACCTGCTCAACA 
GATTAGAGAGTTTACCTGGCAGATGGTTAACTTCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTG 
CAGTTGCCTGGCAAGAAATTAAAGAAAAACTAGACATCCCTGTTTTAGGCGTTATTTTACCAGGAGCTAGCGCAGCTATCAAATCA 
AC T AAT T T AGG G AAAGT T GGT AT TAT AG GT AC T C C CAT G AC T GT T AAAT C AG AT G CT TAT CGT C AAAAAAT T C AAG C 

SEQ ID NO. 1810: SAG1600 FROM THE 18RS21 TYPE II STRAIN 

ATTTCTTTAAAACCTTTTGGGCTGGCGGTTGTGTAATATTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATA 
GT T C AAT AAAAC AG AAAT AT C AC G AAC GG T TTCTGCGC C ACT AT C AAT T AAT T T AAC C T C AG CC C C CAT AAC AT T T T G AAT GAT G G 
GACGTAATATGGGATAATGCGTGCAACCTAAAATTAAAGTA 

SEQ ID NO. 1811: SAG1600 FROM THE 2603 V/R GBS TYPE V STRAIN 

ATTTCTTTAAAACCTTTTGGGCTGGCGGTTGTGTAATAAGTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAAT 
AGTTCAATAAAACAGAAATATCACGAACGGTTTCTGCGCCACTATCAATTAATTTAACCTCAGCCCCCATAACATTTTGAATGATG 
GGACGTAATAGGGGATAATGCGTGCAACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTT 
T T T GG CT AAAC TAG AAG AC AT C T GAT T T GAT T C C AC AAT T GG AACAA 

SEQ ID NO. 1812: SAG1600 FROM THE M781 GBS TYPE III STRAIN 

G GC G GT T GT GT AAAAGT GAT G AC C AC C GT GT T T AT T T T G C C AAT TAT GGT T TAT C T C AAAAT AG T T C AAT AAAAC AG AAAT AT C AC 
GAACGGTTTCTGCGCCACTATCAATTAATTTAACCTCAGCCCCCATAACATTTTGAATGATGGGACGTAATAGGGGATAATGCGTG 
CAACCTAAAATT AAAGT AT CTAATTT AC CAACTAATGGGGACAACGTTTCAT AAAC CACCTTTTTGGCTAAACTAGAAGA 

SEQ ID NO. 1813: SAG1600 FROM THE M 781 GBS TYPE III STRAIN 

AAT C T T CAT T GG AG AT C AGG CT AG AG C T C C GT AT GG T C C TAG AC C T G C T C AAC AG AT T AG AG AGT T T AC CT G G C AG AT GGT T AAC T 
TCTTATTGACTAAAAATGTTAAGATGATTGTTATAGCTTGTAATACAGCAACTGC 

SEQ ID NO. 1814: SAG1600 FROM THE JM9130013 GS TYPE VIII STRAIN 
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TGGGCTGGCGGTTGTGTAAAAGTGATGACCACCGTGTTTATTTTGCCAATTATGGTTTATCTCAAAATAGTTCAATAAAACAGAAA 
TAT C AC G AAC GGT T T CT G C GC C ACT AT C AAT T AAT T T AAC CT C AG C C C C CAT AAC AT T T T GAAT GAT G G G ACGT AAT AAGGG AT AA 
TGCGTGCAACCTAAAATTAAAGTATCTAATTTACCAACTAATGGGGACAACGTTTCATAAACCACCTTTTTGGCTAAACTAGAAGA 
C AT CT GAT T T GAT T C C AC AAT T GG AAC AAAT T T CG G AC AAGC AAG G GAT AC C AC AG C AGT AT T T G G AGAC AAAG C T T GAAT T T T T T 
G AC G AT AAG C AT CT GAT T T AAC AGT CAT G GGAG T AC C TAT AAT AC C AAC T T T C CCT GAA 

SEQ ID NO. 1901: SAG1680 FROM THE 2603 V/R GBS TYPE V STRAIN 

ATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTC 
GACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCAT 
CAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGT 
TTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCAT 
AGCTGCTTGAACTGCAACTGCTTTACCTGT^ACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACTGAAACCTTGAGCTG 
C T AAAG CT T T AAAAC AAC C AAT G C CAT C T GT CAT AT G G C C T AC T AAACGT C C GGT T C C AC C T T GAT T AACGAT AGT AT T T AC AGC A 
C C C AC T AAT T T AG C T T GAG GAG AT AAAT CAT C T AG C AAAGGG AT AAC AC T CT GT T T AAAT GG C AT T G AAAC AT T AAC AC C AC GAAT 
ACCCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTT 
CT T G AAAAG AGGT ATT C C AC AT T AACG GGG AT AG AGAGT GGCGT G C AGG 

SEQ ID NO. 1902: SA61680 FROM THE H36b GBS TYPE lb STRAIN 

GTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAACAAA 
T C GT AAC AAT GCTGTTtCTT T AGGC T T GT AAAC C AAGT C G AC AAC T AC T AAAT T C GGT GT T AAAAT T T C T GGAT C GT T AAT T AAAC 
TATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTA 
TTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCT 
GTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAGCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTA 
TTGTAATTATTTTATTTTTAGCACTGAAACCTTGAGCTGCTAAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTACTAAA 
CGTCCGGTTCCACCTTGATTAACGATAGTATTTAGAGCACCCACTAATTTAGCTTGAGGAGATAAATCATCTAGCAAAGGGATAAC 
ACT C T G T T T AAAT GG C AT T G AAAC AT T AAC AC C AC GAAT AC C C AAT GC C C T G AC AC C T C G AAC AGC T T C T GT T AAT T T ACC C T CT T 
CTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTTCTTGAAAAGAGGTATTCCACATTAACGGGGATAGAGAGTGGCGTGCA 
GGA 

SEQ ID NO. 1903: SAG1680 FROM THE M732 GBS TYPE III STRAIN 

CTGGTCTAATTGCCAATCCTGCACGCCACTCTCTATCCCCGTTAATGTGGAATACCTCTTTTCAAGAAAAAAACATGAATTATGCC 
TAT C T G AC AT T T G AAGT AG AAG AGGGT AAAT T AAC AG AAG C T GT T C GAG G T GT C AGGG C AT T G AGT AT TCGTGGTGT T AAT GT T T C 
AATGCCATTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTATCGTTA 
ATCAAGGTGGAACCGGACGTTTAGTAGGCCATATGACAGATGGCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCT 
AAAAATAAAATAATTACAATAGCTGGTATTGGTGGTTCAGGTAAAGCAGTTGCAGTTCAAGCAGCTATGGAGGGAGTTGCGGAAAT 
TAGATTATTTAATCGTAACAGCTCAAATTACGATAAGGTCATTGACTTATCAGATAAAATTAAAAAACAGTTTCAAATAAAGGTAG 
TCGTTGATTATCTAGAAAATAAGACAGCATTTAAAGACGCTATTAGAACTAGTCATTTTTATATTGATGCTACTAGTTTAGGAATG 
AGGCCATTAGATAATTATAGTTTAATTAACGATCCAGATATTTTAACACCGAATTTAGTAGTTGTCGACTT 

SEQ ID NO. 1904: SAG1680 FROM THE M781 GBS TYPE III STRAIN 

AAATCAGCATCCCTAGACATTATAAGCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAA 
CCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTA 
GTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTG 
AAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTC 
CCTCCATAGCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACTGAAACCT 
T GAG C T G C T AAAG CT T T AAAAC AAC C AAT GC C AT CT GT CAT AT G G C CT AC T AAAC GTCCGGTTC C AC C T T G AT T AAC GAT AGT AT T 
T AC AG C AC C C ACT AAT T T AG CT T GAG GAG AT AAAT CAT C T AG C AAAG G GAT AAC AC T C T GT T T AAAT GG CAT T G AAAC AT T AAC AC 
CACGAATACTCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATG 
TTTTTTTCTTGAAAAGAGGTATTCCACATTAACGGGGATAGAGAGTGGCGTGCA 

SEQ ID NO. 1905: SAG1680 FROM THE 090 GBS TYPE la STRAIN 

GTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTGTTATCCCtTTGCTArATGATTT 
AT C T C C T C AAG C T AAAT T AGT G GGT G C T GT AAAT ACT AT C G T T AAT C AAGGT G G AAC C G s AC GT T T AGT AGG C CAT AT GAC AG AT G 
GCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAGTTACAATAGCTGGTATTGGTGGTTCAGGT 
AAAG C AGT T G C AG T T C AAG C AG C T AT GG AGG GAG T T G C GG AAAT TAG AT TAT T T AAT C GT AAT AG CT C AAAT TAG GAT AAG G T CAT 
TGACTTATCAGATAAAATTAAAAAACAGTTTCAAATAAAGGTAGTCGTTGATTATCTAGAAAATAAGACAGCATTTAAAGACGCTA 
TTAGAACTAGTCATTTTTATATTGATGCTACTAGTTTAGGAATGArGCCATTAGATAATTATAGTTTAATTAACGATCCAGAAATT 
T T AAC AC C C AAT T T AGT AGT T G T C GAC T T GGT T T AC AAG CCT AAAG AAAC AGC AT T GT TAG GAT T T GT T AG AC AAAAT G GAGT GAA 
AC AT G C T TAT AAT GGT C T AG G G AT G C T GAT T T AT C AAG GAG C AG A 

SEQ ID NO. 1906: SAG1680 FROM THE A909 GBS TYPE la STRAIN 

CCCTAGACCATTATAATCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGA 
CAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAAACTATAATTATCT7VATGGCCTCATTCCTAAACTAGTAGCATCA 
ATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTT 
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TTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTGTTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAG 
CTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAATTATTTTATTTTTAGCACTGAAACCTTGAGCTGCT 
AAAGCTTTAAAACAACCAATGCCATCTGTCATATGGCCTACTAAACGTCCGGTTCCACCTTGATTAACGATAGTATTTACAGCACC 
CACTAATTTAGCTTGAGGAGATAAATCATCTAGCAAAGGGATAACACTCTGTTTAAATGGCATTGAAACATTAACACCACGAATAC 
CCAATGCCCTGACACCTCGAACAGCTTCTGTTAATTTACCCTCTTCTACTTCAAATGTCAGATAGGCATAATTCATGTTTTTTTCT 
TGAAAAGAGGTATTCCACATTAACGGGGATAG 

SEQ ID NO. 1907: SAG1680 FROM THE COHl GBS TYPE la STRAIN 

T G C ACG C C AC T C T C TAT C C C C GT T AAT GT G G AAT AC C T C T T T T AAG AAAAAAAC AT G AAT TAT G C C T AT C T GAC AT T T GAAGT AGA 
AGAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGAGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTG 
TTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACT 

SEQ ID NO. 1908: SAG1680 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

ATTCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAA 
CAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTGGGTGTTAAAATTTCTGGATCGTTAATT 
AAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGT 
CTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTG 
AGCTATTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATAACTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCA 
GCTATTGTAACTATTTT 

SEQ ID NO. 1909: SAG1680 FROM THE CJB110 GBS NONT Y PE ABLE STRAIN 

ACTCTCTATCCCCGTTAATGTGGAATACCTCTTTTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGAAGAGGGT 
AAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTGTTATCCC 
TTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTATCGTTAATCAAGGTGGAACCGGACGTTTAGTAG 
GCCATATGACAGATGGCATTGGTTGTTTTAAAGCTTTAGCAGCTCAAGGTTTCAGTGCTAAAAATAAAATAGTTACAATAGCTGGT 
ATTGGTG 

SEQ ID NO. 1910: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN 

ATTCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCAGCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAA 
CAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATT 
AAACT AT AAT T AT CT AAT G G C C T CAT T C CT AAAC T AGT AG CAT C AAT AT AAAAAT G ACT AGT T CT AAT AGC GT C T TT AAAT G C T GT 
CTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTG 
AGCTGTTACGAT 

SEQ ID NO. 1911: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN 

ACTTCTCTATTCCCCGTTAATGTGGAATACCTCTTTTCAAGAAAAAAACATGAATTATGCCTATCTGACATTTGAAGTAGAAGAGG 
GTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGGGTATTCGTGGTGTTAATGTTTCAATGCCATTTAAACAGAGTGTTATC 
CCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTATCGTTAATCAAGGTGGAACC 

SEQ ID NO. 1912: SAG1680 FROM THE 18RS21 GBS TYPE II STRAIN 

TCGTTATTAATTGAAATGCTTCTGCTCCTTGATAAATCATCATCCCTAGACCATTATAAGCATGTTTCACTCCATTTTGTCTAACA 
AATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCGACAACTACTAAATTCGGTGTTAAAATTTCTGGATCGTTAATTAA 
ACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATCAATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCT 
TATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTTTTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAG 
CTGTTACGATT AAAT AAT CT AAT TTCCGCAAC 

SEQ ID NO. 1913: SAG1680 FROM THE 18RS21 GBS TYPE II STRAIN 

ATGCCTATCT GAC ATT T GAAGT AGAAGAGGGTAAATTAACAGAAGCTGTTCGAGGTGTCAGGGCATTGGGT ATT CGTGGTGTT AAT 
GTTTCAATGCCATTTAAACAGAGTGTTATCCCTTTGCTAGATGATTTATCTCCTCAAGCTAAATTAGTGGGTGCTGTAAATACTAT 
C GT T AAT C AAGG T GG AAC CG G ACGT T T AGT AGG C CAT AT GAC AGAT G G CAT TGGTTGTTT T AAAGCT T T AG C AG C T C AAGGT T T C A 
GTGCT AAAAAT AAAATAATT AC AAT AGCTGGT ATT GGTGGTTCAGGTAAAGCAGTTGCAGTTCAAGCAGCT AT GGAGGGAGTTGCG 
G 

SEQ ID NO. 1914: SAG1680 FROM THE JM9130013 GBS TYPE VIII STRAIN 

CCCTAGACCATTATAAGTCATGTTTCACTCCATTTTGTCTAACAAATCGTAACAATGCTGTTTCTTTAGGCTTGTAAACCAAGTCG 
ACAACTACTAAATTGGGTGTTAAT^ATTTCTGGATCGTTAATTAAACTATAATTATCTAATGGCCTCATTCCTAAACTAGTAGCATC 
AATATAAAAATGACTAGTTCTAATAGCGTCTTTAAATGCTGTCTTATTTTCTAGATAATCAACGACTACCTTTATTTGAAACTGTT 
TTTTAATTTTATCTGATAAGTCAATGACCTTATCGTAATTTGAGCTATTACGATTAAATAATCTAATTTCCGCAACTCCCTCCATA 
GCTGCTTGAACTGCAACTGCTTTACCTGAACCACCAATACCAGCTATTGTAACTATTTTATTTTTAGCACTGAAACCTTGAGCTGC 
TAAAGCTTTAAAACAACCAATGCCATCTGTCAT 

SEQ ID NO. 2001: SAG1723 FROM THE COHl GBS TYPE la STRAIN 

ATCGATTCGATATTGTAGTGGCTAACGAAGAAG2\AGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGAT 
GTCATCAAATATAAAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATACTAAATTATTTAA 
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AAAGG AT AAAT T AC AG G AAAAAT AT T C GT AT AAC C C ACT T T T C C AAG AC CT AG C AC AAAG C T CT AC C G C T T T C AC C AC T G AC AG C A 

ATGGCAGCAGCGAATTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGT 
GCCGTCGGTTCCTTCAAAA 



SEQ ID NO. 2002: SAG1680 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

T AAAGT T GAC GGAC AC T C CAT GGAT CC AAC T T TAG C T G AC AAG GAAC AG CT AGT AGT T C T CAAACAAAC AAAAAT C AAT CGAT T C G 
ATATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAA 
TAT AAAAAT GAC AC CTTAACT ATTAACAATAAAAAAACAGAAGAAC CTTAC CT CAAGGAATATACT AAATT AT T TAAAAAGGATAA 
AT T AC AGG AAAAAT AT T C GT AT AAC C C AC TT T T C C AAG AC C T AG C AC AAAGC T CT AC C G C T TT C AC T AC T GAC AG C AAT GGC AG C A 

GCGAATTTACTACTGTCGTGCCTAAAGGCCACTATTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGT 
CCCTTCAAAAAATCAACAATTGTGGGAG 

SEQ ID NO. 2003: SAG1680 FROM THE 18RS21 GBS TYPE II STRAIN 

T T GACGG AC ACTCC ATGGAT C C AACT T T AGCT G AC AAGG AAC AGC T AGT AG TTCT CAAACAAAC AAAAAT C AAT CGAT T CGAT ATT 
GT AGT G GC T AACGAAG AAG AAG GCG G C C AAAAG AAAAAAAT T GT T AAACGT GT CAT T G GT AT G CC AGG T G AT GT CAT C AAAT AT AA 
AAAT GAC AC C T T AAC T AT T AAC AAT AAAAAAAC AGAAG AAC C T T AC C T C AAG G AAT AT AC T AAAT TAT T T AAAAAG GAT AAAT T AC 
AGGAAAAAT AT T CGT AT AAC C C ACT T T T C C AAGAC C TAG C AC AAAG C T C T AC C G C T T T C AC C ACT GAC AG C AAT G G C AGC AG C G AA 

TTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGTCCCTT 
C AAAAAAT C AAC GAT T GT GGGAGAGGT 

SEQ ID NO. 2004: SAG1680 FROM THE 2603 V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

AAGT T GAC G GAC AC T C CAT GGAT C C AACT T T AGCT GAC AAG GAAC AG C TAG T AGT T CT CAAACAAAC AAAAAT C AAT C GAT T CGAT 
ATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATA 
T AAAAATG AC ACCTT AACT ATT AACAAT AAAAAAAC AGAAGAACCTT AC CTCAAGGAAT AT ACT AAATT AT TTAAAAAGGAT AAAT 
T AC AGGAAAAAT AT T C GT AT AAC C C AC T T T T C C AAG AC CT AG C AC AAAG CTCTACCGCTTT C AC C AC T GAC AG C AAT G G C AG C AG C 

GAATTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGT 

SEQ ID NO. 2005: SAG1680 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

T T GAC GGAC AC T C CAT GGAT C C AAC T T TAG C T GAC AAG GAAC AG C T AGT AG TTCT C AAAC AAAC AAAAT AAT C GAT T C GAT AT T GT 

AGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAATATAAAA 
AT GAC AC C T T AAC TAT T AAC AAT AAAAAAAC AG AAGAAC CT T AC C T C AAGG AAT AT AC T AAAT TAT T T AAAAAG GAT AAATT AC AG 

GAAAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGAATT 
TACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGA 

SEQ ID NO. 2006: SAG1680 FROM THE M781 GBS TYPE III STRAIN 

T T GAC G GAC AC T C C AT G GAT C C AACT T T AGCT G AC AAGGAAC AG CT AGT AGT T C T CAAACAAAC AAAAAT C AAT C GAT T C GAT AT T 
G T AGT G G CT AACGAAG AAG AAG GC GGC CAAAAGAAAAAAAT T GT T AAACG T GT CAT T GGT AT G C C AGG T GAT GT C AT C AAAT AT AA 
AAAT GAC AC C T T AAC TAT T AAC AAT AAAAAAAC AG AAG AAC C T T ACC T CAAGGAATATACT AAAT TAT T T T AAAAAGGAT AAAT T A 

CAGGAAAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAGCGA 
ATTTACT 



SEQ ID NO. 2007: SAG1680 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

T T GG T AAAGT T GAC GGAC AC T C C AT GGAT C C AACT T TAG C T GAC AAGG AAC AG C T AGT AGT T C T CAAACAAAC AAAAAT C AAT C GA 
TTCGATATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCAT 
CAAATATAAAAATGACACCTTAACTATTAACAATAAAAAAACAGAAGAACCTTACCTCAAGGAATATACTAAATTATTTAAAAAGG 
AT AAAT T AC AGG AAAAAT AT T C GT AT AAC C C AC T T T T C C AAG AC C TAG C AC AAAG CT C T AC CGC T TTCACTACT GAC AG C AAT GGC 

AGCAGCGAATTTACCACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGT 
CGGCCCCTTCAAAAAATCAACG 

SEQ ID NO. 2008: SAG1680 FROM THE H36b GBS TYPE lb STRAIN (REVERSE COMPLEMENT) 

T T GAC GGAC AC T C CAT GGAT C C AAC T T T AG C T GAC AAG GAAC AG C TAG TAG TTCT CAAACAAAC AAAAAT C AAT C GAT T C GAT AT T 
GT AGT G G CT AAC G AAGAAG AAG G C G GC CAAAAGAAAAAAAT T GT T AAAC GT GT CAT T G G TAT G C C AGGT GAT GT CAT C AAAT AT AA 

AAATG AC ACCTT AACT ATT AAC AAT AAAAAAAC AG AAGAACCTTACCTC AAGG AAT AT ACT AAATT AT TTAAAAAGGAT AAATT AC 
AGGAAAAAT AT T C GT AT AAC C C AC T T T T C C AAG AC C TAG C AC AAAG C T C TAG C G C T T T C AC C AC T GAC AG C AAT GG C AG C AG C GAA 
TTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGA 

SEQ ID NO. 2009: SAG1680 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

T AAAG T T G ACG G AC AC T C CAT GGAT C C AAC T T T AG C T GAC AAG GAAC AG C T AGT AGT T C T CAAACAAAC AAAAAT C AAT C GAT T CG 
AT AT T G T AGT G G C T AAC G AAG AAG AAG G C GG C CAAAAGAAAAAAAT T GT T AAAC G T GT CAT T GG T AT G C C AGGT GAT GT CAT C AAA 
TAT AAAAAT GAC AC C T T AAC TAT T AAC AAT AAAAAAAC AG AAGAACC T TAG C T C AAG G AAT AT AC T AAAT TAT T T AAAAAG G ? T AA 
AT T AC AGG AAAAAT AT T C GT AT AAC C C ACT T T T C C AAG AC C T AG C AC AAAG C TCTACCGCTTT C AC T AC T GAC AG C AAT GG C AG C A 
GCGAATTTACTACTGTCGTGCCTAAAGGCCACTATTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGT 

SEQ ID NO. 2010: SAG1680 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 
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AAAG T T G ACG G AC AC T C C AT GGAT C C AACT T T AG CT G AC AAG G AAC AGC T AGT AG T T CT C AAAC AAAC AAAAAT C AAT C GAT T C G A 
TATTGTAGTGGCTAACGAAGAAGAAGGCGGCCAAAAGAAAAAAATTGTTAAACGTGTCATTGGTATGCCAGGTGATGTCATCAAAT 
AT AAAAAT G AC AC CT T AACT AT T AAC AAT AAAAAAAC AG AAG AAC CT T AC CT C AAG G AAT AT AC T AAAT TAT T TAAAAAGG AT AAA 
TTACAGGAAAAATATTCGTATAACCCACTTTTCCAAGACCTAGCACAAAGCTCTACCGCTTTCACCACTGACAGCAATGGCAGCAG 
CGAATTTACTACTGTCGTGCCTAAAGGCCACTACTATCTTGTTGGTGATGACCGAATTGTCTCTAAAGATAGTCGTGCCGTCGGTC 
C CT T C AAAAAAT C AAC G 

SEQ ID NO. 2101: SAG0079 FROM THE 2603V/R GBS TYPE V STRAIN 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
T T C CT GAT GAAGT AAC AAAC G GG AT T GT AAAAG AG C G C T T AG CT GAGG ATG AT AT CGC AG AAAAAGG T T T T T TACT T GAT G GAT AT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGT 
G GAT C CAT C AT GT C T TAT AG AG C GT T T G AGT G GT C GT AT T AT C AAT C GT AAAACT GGT GAAAC T T T C C AC AAAGT G T T C AAC C C AC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGC 
AGATGTTGAAAAAGCGTTG 

SEQ ID NO. 2102: SAG0079 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AAC AGG GGAT AT GT T C C G CG C CG C AAT GG CT AAT C AAAC C G AAAT GG G ACGT T TAG C T AAAAGT T AT AT T GAT AAAGGT G AAT T GG 
T T C CT GAT GAAGT AAC AAAC G GG AT T GT AAAAGAGC G C T TAG C T GAG GAT GAT AT C GC AG AAAAAG GT T T T T T AC T T GAT GGAT AT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
C AAGGAG AAC C TAT T C T T G AAC AC TAT CGT AAG CT T GGTCTTGT T AC AG AT AT T G AAGGT AAT C AAG AAAT AAC AGAAGT T T T T G C 
AGATGTTGAAAAAGCGTTGCTAGAACTCAAA 

SEQ ID NO. 2103: SAG0079 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT ) 

T GGT AAAGGG ACT CAAGCAGCT AAG ATTGT T GAAGAATTT GG TGTTGCGC AC AT CTC AAC AGG GGAT AT GTTCCGCGCCGC AAT GG 
CTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGATCAAGTAACAAACGGGATTGTA 
AAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGGTATCCACGTACTATTGAACAAGCACACGCCTT 
AGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATCCATCATGTCTTATAGAGCGTTTGA 
GTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTAT 
C AAC GT GAAG AT GAT AAG C C T GAAAC T GT C AAAC GT C G CT T GG AC GT T CAT AT T G CT C AAGGAG AAC C TAT T CT T G AAC ACT AT AG 
T AAG CTTGGCCT T GT T AC AG AT AT T GAAGGT AAT C AAG AAAT AA 

SEQ ID NO. 2104: SAG0079 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAACCACGGGTTCGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AAC AGGG G AT AT GT TCCGCGCCG C AAT GG C T AAT C AAAC C G AAAT G GG AC GT T TAG CT AAAAGT TAT AT T GAT AAAGG T G AAT T GG 
T T C C T GAT GAAGT AAC AAAC GGGAT T G T AAAAGAG CG C T T AG C T GAGG AT GAT AT C GC AG AAAAAGGT T T T T T AC T T GAT G GAT AT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTT ATT AAT ATT AAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
C AGT AG AT T AT AAAG AAG AAGAT TACT AT C AAC GT GAAG AT GAT AAG C C T GAAAC T GT C AAAC GTCGCTTG G ACG T T AAT AT T GCT 
CAAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGC 
AGATGTTGAAAAAGCGTTGCTAGAA 

SEQ ID NO. 2105: SAG0079 FROM THE 2603V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AAC AG GGGAT AT GTTCCGCGC C GC AAT GG C T AAT C AAAC C G AAAT GG G ACGT T T AG CT AAAAGT TAT AT T GAT AAAG G T G AAT T G G 
TTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATAT 
C C AC GT ACT AT T G AAC AAGC AC AC G C C T TAG AT GCTACGCTT GAAG AAC TAG G ACT AC GCT TAG AT GGT GT T AT T AAT AT T AAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGC 
AGATGTTGAAAAAGCGTTG 

SEQ ID NO. 2106: SAG0079 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGT AAAGGT ACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
T T C C T GAT GAAGT AAC AAACGG GAT T GT AAAAG AG C G C T T AG CT GAG GAT GAT AT C G C AG AAAAAG G T T T T T T AC T T GAT G GAT AT 
C C AC GT AC T AT T G AAC AAG C AC AC GC CT TAG AT G CT AC G C T T GAAG AACT AGG AC T ACG C T TAG AT G G T GT TAT T AAT AT T AAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCT 
CAAGGAGAATCTATTCTTGAACACTATCGAAAGCTTGGTCTTGTTACAGATATTGAAGGTAA 
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SEQ ID NO. 2107: SAG0079 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAACCACGGGTTTGCTTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
T T C CT GAT GAAGT AAC AAAC G GGAT T GT AAAAGAGCG C T TAG C T GAG GAT GAT AT CG C AG AAAAAGGTT T TT T AC T T G AT GG AT AT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGT AGATTAT AAAGAAGAAGAT T ACT AT C AACGTGAAGATGAT AAGCCTGAAACTGT CAAACGT CGCT TGGACGTT AAT ATTGCT 
CAAGGAGAACCTATTCTTGAACACTATAG 

SEQ ID NO. 2108: SAGO 07 9 FROM THE COHl GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCTCACATCTCA 
ACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGT 
T C C T GAT GAAGT AAC AAAC G GG AT T G T AAAAG AG CGCT TAG C T GAGG AT GAT AT C G C AGAAAAAG G T T T T T T AC T T GAT GGAT AT C 
CACGTACTATTGAGCAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTG 
GATCCAACATGCCTTATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACC 
AGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTC 
AAGGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCA 
GATGTTGAAAAAGCGTTGCTAG 

SEQ ID NO. 2109: SAGO 07 9 FROM THE H36b GBS TRYP lb STRAIN (REVERSE COMPLEMENT) 

CAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTT 
CCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCC 
ACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGG 
ATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAA.CTTTCCACAAAGTGTTCAACCCACCA 
G TAG AT TAT AAAGAAGAAGAT T AC TAT C AAC GT G AAG AT GAT AAG C CT GAAAC T GT CAAACGT C G CT T G GACGT T AAT AT T G C T C A 
AG GAGAAT CT AT T C T T G AAC AC TAT C G T AAG CT T G GT C T T GT T AC AG AT AT T G AAGGT AAT C AAG AAAT AAC AG AAGT T T T T G C AG 
ATGTTGAAAAAGCGTTGCT 

SEQ ID NO. 2110: SAG0079 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
TTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATAT 
CCACGTACTATTGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGT 
GGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCAC 
CAGT AGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTTAAACGTCGCTTGGACGTTAAT ATTGCT 
C AAG GAG AAC CT AT T CT T G AAC AC T AT AAAAAG CTTGGTCT T GT T AC AG AT AT T G AAGGT AAT C A 

SEQ ID NO. 2111: SAGOO 7 9 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

CTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCTCACATCTCAAC 
AGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTC 
CTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCA 
CGTACTATTGAGCAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGA 
TCCAACATGCCTTATAGAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAG 
TAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGT CGCT TGGACGTT AAT ATTGCT CAA 
GGAGAACCTATTCTTGAACACTATCGTAAGCTTGGTCTTGTTACAGATATTGAAGGTAATCAAGAAATAACAGAAGTTTTTGCAGA 
TGTTGAAAAAGCGTTGCTAGAACTCAAA 

SEQ ID NO. 2112: SAGO 07 9 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTACGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCTCACATCTC 
AACAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGG 
TT CCTG ATG AAGT AAC AAACGGGATTGT AAAAG AGCGCTTAGCTGAGGATGAT AT CGCAGAAAAAGGTTTTTTACTT GAT GGAT AT 
C C AC GT AC TAT T GAG C AAGC AC AC G C C T TAG AT G C T AC G CT T G AAG AACT AG GAC T ACG CT T AG AT G GT GT TAT T AAT AT T AAAGT 
G GAT C C AAC AT G C C T T AT AGAG CG T T T G AGT G G C C GT AT TAT C AAT C GT AAAAC T G GT GAAAC T T T C C AC AAAG T GT T C AAC C C AC 
CAGT AG AT TAT AAAGAAGAAGAT T AC TAT C AAC GT G AAG AT GAT AAG C CT GAAAC T GT CAAACGT C G C T T GG AC GT T AAT AT T GCT 
CAA 

>SEQ ID NO 2150:090 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVTDIEGNQEITEVFADVEKALLELK 

>SEQ ID NO 2151:114_1169NT frame: 2 
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GKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPDQVTNGIVKER 
LAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIIN 
RKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVHIAQGEPILEHYSKLGLVTDI 
EGNQEI 

>SEQ ID NO 2152: 114_18RS21 frame: 1 

NLLTTGSPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVTDIEGNQEITEVFADVEKALLE 

>SEQ ID NO 2153: 114_2603 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVT DIE GNQE I TE VFADVEKAL 

>SEQ ID NO 2154: 114_A909 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGESILEH 
YRKLGLVT D I EG 

>SEQ ID NO 2155:114_A909 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGESILEH 
YRKLGLVT DIEG 

>SEQ ID NO 2156: 114_CJB110 frame: 1 

NLLTTGLLGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
Y 

>SEQ ID NO 2157: 114_COHl frame: 3 

LLIMGLPGAGKGTQAAKIVEEFGVAHI STGDMFRAAMANQTQMGRLAKS YI DKGELVPDE 
VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 
ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVTDIEGNQEITEVFADVEKALL 

>SEQ ID NO 2158: 114_H36B frame: 3 

GDMFRAAMANQTEMGRLAKSYIDKGELVPDEVTNGIVKERLAEDDIAEKGFLLDGYPRTI 
EQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIINRKTGETFHKVFNPPVDYKEE 
DYYQREDDKPETVKRRLDVNIAQGESILEHYRKLGLVTDIEGNQEITEVFADVEKAL 

>SEQ ID NO 2159: 114_JM9130013 frame: 1 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGXVKERLAEDDIAEKGFLLDGYPRTIEQAJ1ALDATLEELGLRLDGVINIBCVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YKKLGLVTDIEGN 

>SEQ ID NO 2160:114_M732 frame: 1 

LLIMGLPGAGKGTQAAKIVEEFGVAH I STGDMFRAAMANQTQMGRLAKS YI DKGELVPDE 
VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 
ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVT D I EGNQE I TE VFADVEKALLELK 

>SEQ ID NO 2161: 114__M781 frame: 1 

NLL I TGLPGAGKGTQAAKIVEEFGVAH I STGDMFRAAMANQTQMGRLAKS YIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQ 

SEQ ID NO. 2201: SAG0093 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 
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AAGCCTAAC AGT CAACAAT CAT CAT CT C AAAAGTTGAGG AAT GAG G AT AT AAAAAAG AT AT C CT CT CAAAAAAGAAAT AAGAAAT T 
ACAATT ACC AGCT GTAT CAT CAAAAGATT GGAACTTGATTTT GGT CAAT CGT GACCAT AAACAT GAAGAATT AAGT CCAGATGT GG 
TTCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
TAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
TGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C C AAACAT CAT T T AAC AT T AGAAG AAT AC AT AAC T T TAT T AAAGG AG AAT AAC C AA 

SEQ ID NO. 2202: SA60093 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT ) 

AAG C C T AAC AGT CAACAAT CAT C AC CT C AAAAGT T G AGG AAT GAG G AT AT AAAAAAGAT AT C C T C T CAAAAAAGAAAT AAGAAAT T 
AC GAT T ACC AG CT GTAT CAT C AAAAG AT T G G AAC T T GAT T T T G GT CAAT CGT G AC CAT AAACAT GAAGAAT T AAG T C C AG AT GT GG 
TGCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGT7VACCC 
TAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
TGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C CG AAC AT CGT TT AAC AT TAG AAG AAT AC AT AACTTT AT TAAAGGAGAAT AAC CAA 

SEQ ID NO. 2203: SAG0093 FROM THE 18RS21 GBS TYPE II STRAIN 

AAG C CT AAC AGT CAACAAT CATCATCT CAAAAGTT G AGG AAT G AGG AT AT AAAAAAG AT AT C CT CT CAAAAAAGAAAT AAGAAATT 
AC AAT T ACC AGCTGT AT CATCAAAAGATT GGAACTTGATT TTGGT CAAT CGTGACCAT AAACAT GAAGAATT AAGT CCAGATGT GG 
TTCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
TAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
TGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C C AAAC AT C AT TT AAC AT TAG AAG AAT AC AT AACTTT AT TAAAGGAGAAT AAC CAA 

SEQ ID NO. 2204: SAG0093 FROM THE 2603V/R GBS TYPE V STRAIN 

AC AG T CAACAAT CAT CAT CT C AAAAGT T GAG G AAT GAGG AT AT AAAAAAG AT AT C CT CT CAAAAAAGAAAT AAG AAAT T AC AAT T A 
CCAGCT GTAT CAT CAAAAGATTGGAACTTGATTT TGGT CAAT CGT GACCAT AAACAT GAAGAAT T AAGT CCAGATGT GGT T C CT GT 
TGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAGAACATT 
TAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTAATTTG 
AC GAG G G G AC AAG C AG AAAAGT T G GT AAAAACT T AC T C T C AG C C T G C AGG T G C T AGT G AAC AC C AG AC T G GAT TAG C GAT G GAT AT 
GAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTGTCTTAC 
GGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAAAATAT 
AT GG C C AAAC AT CAT T T AAC AT TAG AAG AAT AC AT AACT T TAT T AAAG GAG AAT AAC C AAAAC C C AG C T T T C T T GT AC AA 

SEQ ID NO. 2205: SAG0093 FROM THE A909 GBS TYPE la STRAIN 

AAG C CT AAC AGT C AAC AAT C AT C AT CT C AAAAGT T GAGG AAT GAGG AT AT AAAAAAG AC AT CCTCT CAAAAAAGAAAT AAGAAATT 
ACGATT ACC AGCTGT AT C AT CAAAAGATTGG AACT TGATTTTGGTCAATCGT GACCAT AAAC AT GAAGAATT AAGT CCAGATGTGG 
TGCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAAATGACTAGTAACCC 
T AAT T T G ACG AAG G AAC AAG C AG AAAAG T T G G T AAAAAC T T AC T C T C AG C CT G C AGGT G C T AGT G AAC AC C AG AC T G GAT TAG CG A 
TGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C C AAAC AT CAT T T AAC ATT AGAAG AAT ACATAACTT T ATT AAAGG AG AAT AAC CAA 

SEQ ID NO. 2206: SAG0093 FROM THE CJBllO GBS NONTYPEABLE STRAIN 

AAGCCTAAC AGT CAACAAT CATC AT CT CAAAAGTT GAGGAATGAGGAT AT AAAAAAGAT AT CCTCT CAAAAAAGAAAT AAGAAATT 
TAC AAT TACCAGCT GTAT CAT CAAAAGAT T GG AACT T GATT TT GGT CAAT CGT GACCAT AAAC AT GAAGAAT T AAGT C CAGAT GT G 
GTTCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACG 
AGAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACC 
CTAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCG 
ATGGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTT 
TGTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTG 
C AAAAT AT AT GGC CAAACAT CATTT AACAT T AG AAG AAT AC AT AACT T TAT TAAAGGAGAAT AACCAA 

SEQ ID NO. 2207: SAG0093 FROM THE COHl GBS TYPE III STRAIN 

CCT AAC AGT C AAC AAT C ATC AT CT C AAAAGT TG AGG AAT GAGG AT AT AAAAAAGAC AT CCTCTC AAAAAAG AAAT T AAG AAAT TAC 
GATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGGTG 
C C T G T T G AAAAT AT T TAT T T G GAT AAAC GTAT T ACG AAG C AAG C T AC T C AGT T T T TAG AG G C T G C TAG AG CAAT T GAT T C ACG AGA 
ACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCTA 
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ATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGATG 
GAT AT GAGT AC T GT AG AT T CT T T GAAT GAGAG C GAT C CT AG AGT AGT C AGT C AGT T GAAAAAG AT AG CT C C AC AAT AT G GT T T T G T 
CTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCAA 
AAT AT AT G GT C AAAC AT CAT T T AAC AT T AGAAG AAT AC AT AACT T T AT T AAAGG AGAAT AAC C AAAAC C C AG C T T T CT T GT AC AA 

SEQ ID NO. 2208: SAG0093 FROM THE H36b GBS TYPE lb STRAIN 

AAGCCT AAC AGT CAACAAT CAT CAT CT C AAAAGTT GAGGAATGAGGAT ATAAAAAAGACAT CCT CT CAAAAAAGAAAT AAGAAATT 
ACGATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGG 
TGCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAwGAAATGACTAGTAACCC 
TAATTTGACGAAGGAACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
T GGAT AT GAGTACTGT AGATT CT T TGAAT GAGAGCGAT C CT AGAGTAGTCAGT CAGTT GAAAAAG AT AG CT C C ACAATATGGT TTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG C C AAAC AT CAT T T AAC AT T AG AAG AAT AC AT AAC T T T AT T AAAGGAGAAT AACC AA 

SEQ ID NO. 2209: SAG0093 FROM THE JM9130013 GBS TYPE VIII STRAIN 

AAGC CT AAC AGT CAACAAT CAT CAT CT CAAAAGTT GAGGAAT GAGGAT AT AAAAAAGAT AT CCT CT CAAAAAAGAAAT AAGAAATT 
ACAATTACCAGCTGTATCATCAAAAGATTGGAACTTGATTTTGGTCAATCGTGACCATAAACATGAAGAATTAAGTCCAGATGTGG 
TTCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGA 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
TAATTTGACGAGGGGACAAGCAGAAAAGTTGGTAAAAACTTACTCTCAGCCTGCAGGTGCTAGTGAACACCAGACTGGATTAGCGA 
T GGAT AT GAGT ACT GT AGATT CTT TGAAT GAGAGCGAT CCT AGAGTAGTCAGT CAGTT GAAAAAGATAGCTCCAC AAT ATGGTTTT 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAATATATGGCCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAA 

SEQ ID NO. 2210: SAG0093 FROM THE M732 GBS TYPE III STRAIN 

AGC CT AAC AGT CAACAAT CAT CAT CT C AAAAGT T GAGGAATGAGGAT AT AAAAAAG AC AT C CT CT CAAAAAAGAAAT AAGAAATT A 
C GAT T AC C AG CT GT AT CAT C AAAAG AT T G GAACT T GAT T T T G GT C AAT C G T G AC CAT AAAC AT G AAG AAT T AAGT C C AG AT GT GG T 
GCCTGTTGAAAATATTTATTTGGATAAACGTATTACGAAGCAAGCTACTCAGTTTTTAGAGGCTGCTAGAGCAATTGATTCACGAG 
AACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCCT 
AATTTGACGAGGGGAC AAGC AG AAAAGT TGGTAAAAACTT ACT CTC AGC CTGCAGGTGCT AGT GAACACC AG ACT GG AT TAGCGAT 
GGATATGAGTACTGTAGATTCTTTGAATGAGAGCGATCCTAGAGTAGTCAGTCAGTTGAAAAAGATAGCTCCACAATATGGTTTTG 
TCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGCA 
AAATATATGGTCAAACATCATTTAACATTAGAAGAATACATAACTTTATTAAAGGAGAATAACCAAAACCCAGCTTTCTT 

SEQ ID NO. 2211: SAG0093 FROM THE M781 GBS TYPE III STRAIN 

AAGCCTAACAGTC AAC AAT CAT CAT CT CAAAAGTT GAGGAAT GAGGAT AT AAAAAAGAC AT C CT CT CAAAAAAGAAAT AAG AAAT T 
AC GAT T AC C AG C T G TAT CAT C AAAAG AT T GGAACT T G AT TT T GG T C AAT C GTG AC CAT AAAC AT G AAG AAT T AAGT C C AG AT GT G G 
T G C C T GT TGAAAAT AT T TAT T T GGAT AAAC GT AT T AC G AAG C AAG CT AC T C AG T T T T TAG AG G C T G CT AG AG C AAT T GAT T C ACG A 
GAACATTTAATTTCGGGTTATCGTAGTGTTGCCTATCAGGAGAAGTTGTTCAATTCTTATGTTACTCAAGAGATGACTAGTAACCC 
T AAT T T G AC G AGGG G AC AAGC AG AAAAG T T GGT AAAAAC T T AC T CT C AG C C T G C AGGT G C T AGT G AAC AC C AG ACT G GAT TAG CG A 
T GGAT AT GAG TACT GT AG AT T C T T T GAAT GAGAG C GAT C C T AGAGT AGT C AGT C AGT T G AAAAAGAT AG CTC C AC AAT AT GGT T T T 
GTCTTACGGTTTCCGGATGGTAAAACAGCAGAAACAGGGGTAGGTTATGAAGATTGGCATTACCGCTATGTTGGGGTAGAGTCTGC 
AAAAT AT AT GG T C AAAC AT CAT T T AAC AT T AG AAGAAT AC AT AACT T T AT T AAAGG AG AAT AAC CAA 

>SEQ ID NO 2250: 18_090 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQ 

>SEQ ID NO 2251: 18_1169NT frame: 1 

KPNSQQS S PQKLRNEDIKKI S SQKRNKKLRLPAVS SKDWNLILVNRDHKHEELS PDWPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVESAKYMAEHRLTLEEYITLLKENNQ 

>SEQ ID NO 2252: 18JL8RS21 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
EN I Y L D KR I T KQ AT Q FL E AARA IDS RE HLISGYRS VA YQE K L FN S Y VT QEMT S N PN LT RG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYEDWHYRYVGVE S AKYMAKHHLTLEE YI T LLKENNQ 

>SEQ ID NO 2253: 18_2603 frame: 3 
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SQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDWPVENI 
YLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRGQAE 
KLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGKTAE 
TGVGYEDWHYRYVGVESAKYMAKHHLTLEEYITLLKENNQNPAFLY 

>SEQ ID NO 2254: 18__A909 frame: 1 

KPNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDWPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTKE 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
T AETGVG YE DWHYRYVGVE S AKYMAKHHLT LEE Y I T LLKENNQ 

>SEQ ID NO 2255:18_CJB110 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKFTITSCIIKRLELDFGQS 

>SEQ ID NO 2256:18_COHl frame: 1 

PNSQQSSSQKLRNEDIKKTSSQKRN 

>SEQ ID NO 2257: 18_H36B frame: 1 

KPNSQQSSS QKLRNE D I KKT S S QKRN KKLRL P AV S S K DWN L I L VN R D HKHE E L S P D VV P V 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTXEMTSNPNLTKE 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYE DWHYRYVGVE SAKYMAKHHLTLEEY I TLLKENNQ 

>SEQ ID NO 2258: 18_JM9130013 frame: 1 

KPNSQQSSSQKLRNEDIKKISSQKRNKKLQLPAVSSKDWNLILVNRDHKHEELSPDVVPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGK 
TAETGVGYE DWHYRYVG VES AKYMAKHHLTLEE YI TL LKENNQ 

>SEQ ID NO 2259:18_M732 frame: 3 

PNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDVVPVE 
NIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRGQ 
AEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRVVSQLKKIAPQYGFVLRFPDGKT 
AETGVGYEDWHYRYVGVESAKYMVKHHLTLEEYITLLKENNQNPAF 



>SEQ ID NO 2260: 18_M781 frame: 1 

KPNSQQSSSQKLRNEDIKKTSSQKRNKKLRLPAVSSKDWNLILVNRDHKHEELSPDWPV 
ENIYLDKRITKQATQFLEAARAIDSREHLISGYRSVAYQEKLFNSYVTQEMTSNPNLTRG 
QAEKLVKTYSQPAGASEHQTGLAMDMSTVDSLNESDPRWSQLKKIAPQYGFVLRFPDGK 
TAETGVGYE DWHYRYVGVESAKYMVKHHLTLEEYITLLKENNQ 

SEQ ID NO. 2301: SAG0163 FROM THE 090 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTATGAACTCTATATGCGTATTGATGATGAAAGGC 
GGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAA 
AG AC G AAG T C AAT T AGG T T C T T G T G AC TAT G AACT GT C AG AG GG AAG ACT GGT T T CAT TAG G AC TAT C G AGT GT G GG AG AT TAT CG 
T GGT C AAGAAT CTT T AGTT AT TCGT AT TTTGT AT T CAGGT C AT CAGGACT T AAAAT AT TGGT T T GATAAT ATAAAGCAAATGAAGG 
AAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAA 
GTATTTAAAAATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAACTCCAATTGAATGAGGA 
TATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAG 
CGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTTCCGGAGTCTAT 
GAT AGGCT TAT AGAATTAGGGGTT AACT AT CAAG AGTT AGAAAAT AGT CT AAAAT T AAT AGC AT AT CAACGTTTAATTGGAGG AGG 
AAGCCTAATTGACTTTGAGACAGGTAAQTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAG 
GACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAACGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2302: SAG0163 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GGTGATTGTTATGAAACCTCTACTATTGCGTATTTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGT 
CTTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTC 
AGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAG 
GTCATCAGGACTTAAAATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCGGC 
C C T GT G GGG AGT GGT AAAAC AAC T CT C AT GT AT C AAT T AG CTT C AG AAGT AT T T AAAAAT AAG C AAAT T AT C AC GAT T G AAGAT C C 
GGT AGAAAT CAAGAATGAC AAGAT GTT ACAACT CCAATT GAATG AGGAT ATT GG AAT G ACT T AT GAT GCTTT AAT CAAACT GT CTT 
TACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCTCGTGCTGTTATTCGTGCAAGTTTAACGGGA 
GTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTT 
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AGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGTAACTTTAAAAAAC 
ACT CAT C AG ACAAGT G GAAT AGAC AAGT GG AT AT CT T GG CT G AAG AAGG AT AT AT C AGT AAG AAAC AGG C AC AAGT C G AAAAAAT T 
AT C C CT C AAGAAAC AACG G AAAGT AGT CC AAC T T T T 

SEQ ID NO. 2303: SAG0163 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

GT T C AAT CAT TAG C AAAG CAAGT CAT T CAT C AGG C AGT AG AAGT AAAT G C T C AAG AT AT T TAT AT CAT T C C C AAAGGT GAT T GT T A 
TGAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 
AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTT 
TCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAA 
ATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTA 
AAAC AAC T C T CAT GT AT C AAT TAG C T T C AG AAGT AT T T AAAAAT AAG C AAAT TAT C AC GAT T G AAG AT C CGGT AG AAAT C AAG AAT 
GAC AAG AT GT T AC AAC T C C AAT T GAAT GAGGAT AT T GG AAT GAC T TAT GAT G C T T T AAT CAAAC T GT C T T T AC GG CAT CGT C C AG A 
TAT T T T AAT TAT C GG AGAG AT T AGAG AT C AAG C G AC GGCCCGTGC T GT T AT T C G T G C AAGT T T AAC G G G AGT GAT GGT T T T T T CT A 
CTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAA 
T T AAT AG CAT AT C AAC GT T T AAT T GG AG GAGGAAG C CT AAT T GACTT T G AG AC AGGT AAT T T T AAAAAAC AC T C AT C AG AC AAGT G 
GAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAA 
C GG AAAGT AGT C C AAC T T T T 

SEQ ID NO. 2304: SAGO 163 FROM THE 2603 V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GATATTTATATCATTCCCAAAGGTGATTGTTATGAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTT 
TAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTT 
GTGACTATGAACTGTCAGAGGGAAGACTGGTTTCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATT 
CGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCT 
ATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAAATTA 
T C AC GAT T G AAG AT C C G G TAG AAAT C AAGAAT G AC AAGAT GT T AC AAC T C C AAT T GAAT GAG GAT AT T G GAAT GAC T TAT GAT G CT 
TTAATCAAACTGTCTTTACGGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCG 
TGCAAGTTTAACGGGAGTGATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGG 
TTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACA 
GGTAATTTTAAAAAACACTCATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGC 
AC AAG T GC G AAAAAAT TAT C C C T C AAG AAAC AAC GG AAAGT AGT C C AAC T T T T 

SEQ ID NO. 2305: SAG0163 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

GTTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTA 
TGT^ACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 
AAT T T G T G G C AG G C AT G AAC GT T G GAG AAAAAAG AC G AAGT C AAT T AGGT T C T T GT GAC TAT G AAC T GT C AG AGGG AAG AC T G G T T 
TCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAA 
ATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTA 
AAAC AACT CT CAT GT AT C AAT TAG C T T C AG AAGT AT T T AAAAAT AAG C AAAT TAT C ACG AT T GAAG AT C CG GT AG AAAT C AAG AAT 
GACAAGATGTTACAACTCCAATTGAATGAGGATATTGGAATGACTTATGATGCTTTAATCAAACTGTCTTTACGGCATCGTCCAGA 
TATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTA 
CTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAA 
TTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACACTCATCAGACAAGTG 
GAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAA 
CGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2306: SAG0163 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GT T C AAT CAT TAG C AAAG CAAGT CAT T CAT C AGG C AG TAG AAG T AAAT G C T C AAG AT AT T TAT AT CAT T C C C AAAGGT GAT T G T T A 
TGAi^CTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 
AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTT 
TCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAA 
ATATTGGTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTACAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTA 
AAAC AAC T C T CAT GT AT C AAT T AG CT T C AG AAG TAT T T AAAAAT AAGC AAAT TAT C ACG AT T GAAG AT C C G G TAG AAAT C AAG AAT 
GAC AAG AT GT T AC AACT C C AATT GAAT GAG GAT AT T G GAAT G ACT TAT GAT G C TT T AAT CAAAC T GT C T T T ACG G CAT C GT C C AG A 
T AT T T T AAT TAT C GG AG AG AT T AG AGAT C AAGC G ACGGC C C G T G C T GT T AT T C GT G CAAGT T T AAC GGG AGT GAT GGT T T T T T C T A 
CTATTCATGCTAAAAGTATTTCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAA 
T T AAT AG CAT AT C AAC G T T T AAT T G G AGG AGG AAG CC T AAT T G ACT T T GAG AC AG GT AACT T T AAAAAAC ACT CAT C AG AC AAG T G 
GAAT AGAC AAGT GGAT AT C T T G G C T GAAG AAGG AC AT AT C AGT AAG AAAC AGG C AC AAGT CG AAAAAAT TAT C C CT C AAG AAAC AA 
CGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2307: SAG0163 FROM THE COH1 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

AGGTGATTGTTATGAAATTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTT 
ATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGA 
GGGAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTC 
ATCAGGACTTAAAATATTGGTTTGATAATATAAAGTAAATGAAGGAAGTACTGTGTGCAAGAGGGCTATATCTTTTTTCCGGCCCT 
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GTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAAATTATCACGATTGAAGATCCGGT 
AG AAAT C AAG AAT GAC AAG AT GT T AC AAC T C C AAT T G AAT G AGG AT AT T G G AAT G AC TT AT GAT G C T T T AAT C AAACT G T CT T T AC 
GGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTA 
ATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGA 
AAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAAGTAACTTTAAAAAACACT 
CAT CAGACAAGT GGAAT AGACAAGTGGAT AT CT TGGCTGAAGAAGGAC ATATCAGT AAGAAACAGGCACAAGT CG AAAAAATT AT C 
C C T C AAG AAAC AAC GG AAAGT AGT C C AACT T T T 

SEQ ID NO. 2308: SAG0163 FROM THE H36b 6BS TYPE lb STRAIN (REVERSE COMPLEMENT) 

T CATT AGCAAAGCAAGT CATT C AT CAGGCAGT AGAAGTAAATGCT C AAGAT AT TT AT AT CATT CCCAAAGGTGATT GT T ATGAACT 
CTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTTG 
TGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTTTCATTA 
CGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAAATATTG 
GTTTGATAATATAAAGCAAATGAAGGAAGTACTGGGTATAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAA 
CTCTCATGTATCAATTAGCTTCAGAAGTATTTAAAAATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAG 
AT GT T AC AAC T C C AAT T GAAT GAG GAT AT T GGAAT GACT T AT GAT G C T T T AAT C AAACT GT CT T T AC GG C AT C GT C C AG AT AT T T T 
AATTATCGGAGAGAAATAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGTTTTTTTCTACTATT 
CATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAAT 
AGCATATCAACGTTTAATTGGAGGAGGAAGCCTAATTGACTTTGAGACAGGTAATTTTAAAAAACACTCATCAGACAAGTGGAATA 
GACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATCCCTCAAGAAACAACGGAA 
AGTAGTCCAACTTTT 

SEQ ID NO. 2309: SAGO 163 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT) 

GTTCAATCATTAGCAAAGCAAGTCATTCATCAGGCAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTA 
TGAACTCTATATGCGTATTGATGATGAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTA 
AATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGAGGGAAGACTGGTT 
TCATTACGACTATCGAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTATTTTGTATTCAGGTCATCAGGACTTAAA 
AT AT T G GT T T G AT AAT AT AAAG C AAAT G AAG G AAGT ACT GG G T AT AAG AGG G CT AT AT CT T TTTTCCGGCCCTGTG G GGAGT GGT A 
AAAC AACT C T C AT GT AT C AAT T AG CT T C AG AAGT AT T T AAAAAT AAG C AAAT TAT C AC GAT T G AAG AT C C G G TAG AAAT C AAG AAT 
GAC AAG AT G T T AC AAC T C C AAT T GAAT GAG GAT AT T GGAAT GAC T TAT GAT G C T T T AAT C AAACT GT C T T T ACGG C AT C GT C C AG A 
TATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTGATGGTTTTTTCTA 
CTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAA 
T T AAT AG CAT AT C AAC GT T T AAT T G G AGGAGG AAG C C T AAT T GAC T T T G AG AC AGGT AAT T T T AAAAAAC AC T CAT C AGAC AAG T G 
GAAT AG AC AAG T GG AT AT C T T G G C T G AAGAAGG AC AT AT C AGT AAG AAAC AGG C AC AAGT C G AAAAAAT TAT C C C T C AAG AAACAA 
CGGAAAGTAGTCCAACTTTT 

SEQ ID NO. 2310: SAG0163 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

TGACTTGTTATGAAACTCTATATGCGTATTTGATGATGAAAAGGCGGTTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTT 
ATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAGACGAAGTCAATTAGGTTCTTGTGACTATGAACTGTCAGA 
GGGAAGACTGGTTTCATTACGACTATCAAGTGTGGGAGATTATCGTGGTCAAGAATCTTTAGTTATTCGTACTTTGTATTCAGGTC 
ATCAGGACTTAAAATATTGGTTTGATAATATAAAGTAAATGAAGGAAGTACTGTGTGCAAGAGGGCTATATCTTTTTTCCGGCCCT 
GT GG GGAGT G G T AAAAC AAC T C T CAT G TAT C AAT TAG C T T C AG AAGT AT T T AAAAAT AAG C AAAT TAT C AC GAT T G AAG AT C C GGT 
AG AAAT C AAGAAT GAC AAGAT GT T AC AAC T C C AAT T GAAT GAGG AT AT T G GAAT GAC T T AT GAT G CT T T AAT C AAAC T GT C T T T AC 
GGCATCGTCCAGATATTTTAATTATCGGAGAGATTAGAGATCAAGCGACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTA 
ATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGATAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGA 
AAAT AG T CT AAAAT T AAT AG CAT AT C AAC GT T T AAT T GG AGG AG G AAG C C T AAT T GAC T T T GAG AC AAGT AAC T T T AAAAAAC AC T 
CATCAGACAAGTGGAATAGACAAGTGGATATCTTGGCTGAAGAAGGACATATCAGTAAGAAACAGGCACAAGTCGAAAAAATTATC 
C CT C AAG AAAC AAC G G AAAG TAG T C C AAC T T T T 

SEQ ID NO. 2311: SAG0163 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

CAGTAGAAGTAAATGCTCAAGATATTTATATCATTCCCAAAGGTGATTGTTATGAATTCTATATGCGTATTGATGATGAAAGGCGG 
TTTATTGATGTTTTTGAGTTTAATAGGATGGCTAGTCTTATTAGTCACTTTAAATTTGTGGCAGGCATGAACGTTGGAGAAAAAAG 
ACG AAGT C AAT TAG GTTCTTGT GAC TAT G AAC T GT C AG AG GG AAG AC T G GT T T CAT T AC GACT AT C AAGT G T GG G AG AT TAT C GT G 
GT C AAG AAT C T T T AGT TAT T C GT ACT T T GT AT T C AG GT CAT C AG GACT T AAAAT AT T GGT T T GAT AAT AT AAAG C AAAT G AAGG AA 
GTACTGTGTGCAAGAGGGCTATATCTTTTTTCCGGCCCTGTGGGGAGTGGTAAAACAACTCTCATGTATCAATTAGCTTCAGAAGT 
ATTTAAAAATAAGCAAATTATCACGATTGAAGATCCGGTAGAAATCAAGAATGACAAGATGTTACAACTCCAATTGAATGAGGATA 
T T G GAAT GAC T TAT GAT G C T T T AAT C AAACT GT C T T T ACGG CAT C GT C C AG AT AT T T T AAT TAT C G GAG AG AT TAG AG AT C AAG C G 
ACGGCCCGTGCTGTTATTCGTGCAAGTTTAACGGGAGTAATGGTTTTTTCTACTATTCATGCTAAAAGTATTCCCGGAGTCTATGA 
TAGGCTTATAGAATTAGGGGTTAACTATCAAGAGTTAGAAAATAGTCTAAAATTAATAGCATATCAACGTTTAATTGGAGGAGGAA 
G C CT AAT T GAC T T T GAG AC AAGT AACT T T AAAAAAC ACT CAT C AG AC AAGT GGAAT AG AC AAGT GG AT AT C T T GG C T G AAG A?. G G A 
CAT AT C AGT AAG AAAC AGGC AC AAGT C G AAAAAAT TAT C C C T C AAG AAAC AAC G G AAAG T AGT C C AAC T T T T 

>SEQ ID NO 2350:63_090 frame: 2 

AVE VN AQ DIYIIPKGDC YE L YMR IDDERRFI D V FE FNRMAS L I S H FK FVAGMN VG E KRR S 
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QLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGTR 
GLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDAL 
IKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVYDRLIELGVNYQ 
E LEN S LKL I A Y QRL IGGGSLID FE T GN FKKH S S DKWNRQ V D I L AE E GH I S KKQ AQ VE K 1 1 
PQETTESSPTF 

>SEQ ID NO 2351: 63_1169NT frame: 3 

. LL . NLYYCVFDDERRFIDVFE FNRMASLISHFKFVAGMNVGEKRRSQLGSCDYELSEGR 
LVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGTRGLYLFSGPVGSGK 
TTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRHRPDILI 
IGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQELENSLKLIAYQR 
LIGGGS LI DFETSNFKKHSSDKWNRQVDILAEEGYI SKKQAQVEKI I PQETTESSPTF 

>SEQ ID NO 2352:63_18RS21 frame: X 

VQSLAKQVIHQAVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTE S S PT F 

>SEQ ID NO 2353: 63_2603 frame: 1 

DIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFVAGMNVGEKRRSQLGSCDY 
ELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIKQMKEVLGIRGLYLFSG 
PVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRH 
RPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQELENSLK 
LIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHISKKQAQVRKNYPSRNNGK 
. SNF 

>SEQ ID NO 2354:63_A909 frame: 1 

VQSLAKQVIHQAVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTE S S PT F 

>SEQ ID NO 2355: 63_CJB110 frame: 1 

VQ S L AKQ V I HQ AVE VNAQ D I Y 1 1 PKGDCYELYMRIDDERRFI DVFE FNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGTRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDECWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTE S S PT F 

>SEQ ID NO 2356: 63_CJB110 frame: 1 

VQSLAKQVIHQAVEVNAQDIYI I PKGDCYELYMRIDDERRFI DVFE FNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGTRGLYL FS G PVGS GKTT LMYQL AS EVFKNKQ 1 1 T I E D PVE IKN DKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSISGVY 
DRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTESSPTF 

>SEQ ID NO 2357: 63_H36B frame: 1 

SLAKQVIHQAVEVNAQDIYII PKGDCYELYMRIDDERRFI DVFE FN RMAS LI SHFKFVAG 
MNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDNIK 
QMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNE 
DIGMTYDALIKLSLRHRPDILIIGEK 

>SEQ ID NO 2358:63_JM9130013 frame: 1 

VQSLAKQVIHQAVEVNAQDIYI I PKGDCYELYMRIDDERRFI DVFE FNRMASLI SHFKFV 
AGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFDN 
IKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQL 
NEDIGMTYDALIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVY 
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DRLIELGWYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDKWNRQVDILAEEGHI 
SKKQAQVEKI I PQETTE S S PT F 

>SEQ ID NO 2359:63_M732 frame: 3 

TCYETLYAYLMMKRRFIDVFEFNRMASLISHFKFVAGMNVGEKRRSQLGSCDYELSEGRL 
VSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIK . MKEVLCARGLYLFS G PVGSGKT 
TLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALIKLSLRHRPDILII 
GEIRDQATARAVIRASLTGVMVFSTIHAKSIPGVYDRLIELGVNYQELENSLKLIAYQRL 
IGGGSLIDFETSNFKKHSSDKWNRQVDILAEEGH I SKKQAQVEKI I PQETTESSPTF 

>SEQ ID NO 2360:63_M781 frame: 3 

VEVNAQDIYIIPKGDCYEFYMRIDDERRFIDVFEFNRMASLISHFKFVAGMNVGEKRRSQ 
LGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIKQMKEVLCARG 
LYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQLNEDIGMTYDALI 
KLS LRHRP D I L 1 1 GE I RDQATARAVIRAS LT GVMVFS T I HAK S I PGVY DRL I E LGVN YQE 
LENSLKLIAYQRLIGGGSL I DFETSNFKKHSSDPCWNRQVDILAEEGHI SKKQAQVEKI IP 
QETTESSPTF 

>SEQ ID NO 2361:63_COHl frame: 3 

VI VMKFYMRI D DERRFI DV FE FNRMASL I SH FK WAGMNVGEKRRS QLGS CD YE LS EGRL 
VSLRLSSVGDYRGQESLVIRTLYSGHQDLKYWFDNIK 

SEQ ID NO. 2401: SAG0290 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C TAT T T C AAC AGGT AT T GAT G C AGGGAAAT T T GAT T T AT C AG CT AAT GAT T T T T C AT AC AAT AAAGAAAG AG C AG AAAAAT AT C T C 
T T CT C AGAC C C TAT AT C C C GT T C AAAT TAT G C CGT AGT AGGG AAG AAG GGG AG C C AT T AC AAAT CAT T AAGT G AC C T C T C T G G AAA 
AT C AAC AG AAGT T T T AT C T GG C GT T AAC TAT GC AC AGGT T C T AG AAAAT T G G AAT AAAAAT CAT C C T AAT AAAAAAC C AAT AAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTTCATCTGACTATATTGTAAAAGATCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
ATTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAG 
ATGGTACTTTGGCACGTTTAAGTAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 

SEQ ID NO. 2402: SAG0290 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATRAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGT AGT AAGT ACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
CT ATTTCAAC AGGT ATTG AT GC AGGG AAATTTGATTT AT CAGCT AAT G ATT TTT CAT AC AAT AAAGAAAG AG CAGAAAAAT AT CTC 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAAT AAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
ACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAA 
AT GGT AC T T T GG C AC G T T T AAGT AAAC AAT AT T T C GG T G G AG AT T AC GT T T C AAAC AT T GAT AAA 

SEQ ID NO. 2403: SAG0290 FROM THE 2603 V/R GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
CT AT T T C AAC AGGT AT T GAT G C AGG G AAAT T T GAT T TAT C AG C T AAT GAT T T T T CAT AC AAT AAAGAAAG AG CAGAAAAAT AT CTC 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
ACTAGAATACCTCCTTTTACCAAAAGATAAAAAAG 

SEQ ID NO. 2404: SAG0290 FROM THE 090 GBS TYPE la STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
CTATTTCAACAGGTATTGATGCAGGGAAATTTGATTTATCAGCTAATGATTTTTCATACAATAAAGAAAGAGCAGAAAAATATCTC 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
AT T T CAT C C G ACT AT AT T GT AAAAG AC C AAT CAT T AAAC T T AAG CGTTTCTCCTTT GAAAG GT AAAAT T G GT AAT AAT AAGG AT G G 
ACT AGAAT AC C T C C T T T T AC C AAAAG AT AAAAAAGGT AAAAC T C T AC AG AAAT T T AT AAAT AAG C GT AT T AAAGT T T T G AAAG AAA 
ATGGTACTTTGGCACGTTTAAGTAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 
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SEQ ID NO. 2405: SAG0290 FROM THE A909 GBS TYPE la STRAIN (REVERSE COMPLEMENT ) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C T AT T T C AAC AGGT AT T G AT GC AGGGAAAT T T GAT TT AT CAGCT AATGATTTTT CAT AC AAT AAAGAAAGAGC AGAAAAATAT CT C 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATNNTAATAAAAAACCANTAAAAA 
TNAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
AT T T CAT C CG ACT AT AT T G T AAAAGAC C AAT CAT T AAAC T T AAG CGTTTCTCCTTT G AAAGGT AAAAT T GGT AAT AAT AAG GAT GG 
ACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATAAATAAGCGT 

SEQ ID NO. 2406: SAG0290 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C T AT T T C AAC AGGT AT T GAT G C AG GGAAAT T T GAT T TAT C AG C T AAT GAT T T T T CAT AC AAT AAAGAAAGAGC AG AAAAAT AT C T C 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACCGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTT CAT CCGACT AT ATT GT AAAAGAC CAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGT AAT AAT AAGGATGG 
ACTAGAATACCTCCTTTTACCAAAAGATAAAAAAGGTAAAACTCTACAGAAATTTATAAATAAGCGTATTAAAGTTTTGAAAGAAA 
ATGGTACTTTGGCACGTTTAAGTAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 

SEQ ID NO. 2407: SAG0290 FROM THE COH1 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GT AT C AGTT CAGGCGT CAGAGAAAGT AGAACTT AAAGT AGCTACAGATT CT GACACGGC ACCAT TT ACTTAT C AAAAAGACGGGAA 
ATTCAAAGGTTATGACGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C T AT T T C AAC AG GT AT T GAT G C AG G G AAAT T T GAT T TAT C AG C T AAT GAT T T T T C AT AT AAT AAAG AAAGAG C AG AAAAAT AT CT C 
T T CTCAGAT C CT AT AT CC CGTT CAAATT ATGCCGTAGT AGGGAAGAAGGGGAG C CAT T AC AAAT CATT AAGT GAC CT CT CT GGAAA 
ATCAACAGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGAAAAATTGACTTTATCCTATATGATGCC 
AT TT CAT CT GACTAT AT TGTAAAAGAT CAAT CATT AAAC T T AAG CGTTT CT C CTTTGAAAGGTAAAAT TGGTAAT AAT AAGGATGG 
AT TAG AAT AC C T CC T T T T AC C AAAAG AT AAAAAAGGT AAAAC T C T AC AGAAAT T TAT AAAT AAG C GT AT T AAAG T T T T G AAAG AAG 
AT GGT ACT T T GG C ACG T T T AAGT AAAC AAT ATT T C G G T GG AG AT T AC GT T T C AAAC AT T GAT AAA 

SEQ ID NO. 2408: SAG0290 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
ATTCAAAGGTTATGATGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C T AT T T C AAC AG GT AT TG AT G C AGGGAAAT T T GAT T T AT C AG C T AAT GAT T T T T C AT AC AAT AAAGAAAG AG C AG AAAAAT AT CT C 
T T C T C AGAT C CT AT AT C C C G T T C AAAT TAT G C C G T AGT AG GG AAG AAGGGG AG C C AT T AC AAAT CAT T AAG T GAC C T C T CT GGAAA 
AT C AAC C G AAG T T T TAT C T G GC GT T AACT AT G C AC AG GT T CT AG AAAAT T G G AAT AAAAAT CAT C C T AAT AAAAAAC C AAT AAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGGAAAATTGACTTTATCCTATATGATGCC 
ATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAAT AAT AAGGATGG 
ACT AGAATACCTCCTTTT AC CAAAAG AT AAAAAAGGT AAAACTCT AC AG AAAT T TAT AAAT AAGCGT ATT AAAGT TTTGAAAGAAA 
AT GGT ACT T T GG C AC G T T T AAGT AAAC AAT AT T T C GGT G GAG AT T AC G T T T C AAAC AT T GAT AAA 

SEQ ID NO. 2409: SAG0290 FROM THE JM9130013 GBS STRAIN VIII (REVERSE COMPLEMENT) 

GT AT CAGTT CAGGCGT CAGAGAAAGT AGAACTT AAAGTAGCT ACAGATT CT GACACGGC ACCAT TT ACTTAT C AAAAAGACGGGAA 
AT TC AAAGGT TATG AT GTTG AT GTTGTCAAAGCTGTTTTT AAAGGT AGT AAGT AC AAAGT AACCTTCAAGACAGTTCCTTTTGAT A 
CT AT TT CAAC AGGT AT TGAT GCAGGGAAATTTGAT TT AT CAGCT AAT GAT TT T T CAT AC AAT AAAGAAAGAGC AG AAAAAT AT CT C 
TT CT C AG AT CCT AT AT CCCGTT CAAATT AT GCCGT AGT AGGG AAG AAGGGG AGCC ATT AC AAAT CATT AAGT GAC CTCTCTGGAAA 
AT CAACCG AAGT TT TAT CT GG CGTT AACTAT GC AC AGGT T CT AG AAAAT TGG AAT AAAAAT CAT CCTAATAAAAAACCAAT AAAAA 
T C AAAT AT GTTTCTGG GAC AAC T G G T GT T AC TAG C AG AT T AAAAAAT AT T GAG AGT GGG AAAAT T GAC T T T AT C C TAT AT GAT G C C 
ATTTCATCCGACTATATTGTAAAAGACCAATCATTAAACTTAAGCGTTTCTCCTTTGAAAGGTAAAATTGGTAATAATAAGGATGG 
ACT AGAATACCT C CTT TTACCAAAAGAT AAAAAAGGT AAAACT CTACAGAAAT TT AT AAAT AAGCGT AAT AAAGTT TT GAAAGAAA 
AT GGT A 

SEQ ID NO. 2410: SAG0290 FROM THE M732 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GTATCAGTTCAGGCGTCAGAGAAAGTAGAACTTAAAGTAGCTACAGATTCTGACACGGCACCATTTACTTATCAAAAAGACGGGAA 
AT T C AAAG GT TAT G AC GT T GAT GT T GT C AAAG CTGTTTTT AAAG GT AG T AAGT AC AAAG T AAC CTT C AAG AC AG TTCCTTTT GAT A 
CT AT TT CAAC AGGT ATT GATGC AGGGAAAT TT GATTT AT CAGCT AAT GATT TTT CAT AT AAT AAAGAAAGAGC AGAAAAATAT CT C 
T T C T C AG AT C CT AT AT CCCGTT C AAAT TAT G C C GT AGT AG G G AAG AAG G G GAG C CAT T AC AAAT CAT T AAGT G AC CT C T CT GGAAA 
AT CAAC AGAAGTTT TAT CT GGCGTT AACTAT GC AC AGGT T CTAGAAAATT GGAAT AAAAAT CAT CCT AAT AAAAAAC CAAT AAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGAAAAATTGACTTTATCCTATATGATGCC 
AT TT C AT CT GACT AT ATTGT AAAAG AT CAAT CATT AAACTT AAG CGTTT CT CCT TTGAAAGGTAAAATT GGT AAT AAT AAGGATGG 
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ATT AGAATAC CT CCTT TT ACCAAAAGAT AAAAAAGGTAAAACTCT ACAGAAATTT ATAAAT AAGCGT ATT AAAGTT TTGAAAGAAG 
ATGGTACTTTGGCACGTTTAAGTAAACAATATTTCGGTGGAGATTACGTTTCAAACATTGATAAA 

SEQ ID NO. 2411: SAG0290 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GT AT C AGT T C AG GC GT C AGAGAAAGT AGAAC T T AAAGT AG C T AC AGAT T CTGAC AC G GC AC CAT T TACT TAT C AAAAAGACGGG AA 
ATTCAAAGGTTATGACGTTGATGTTGTCAAAGCTGTTTTTAAAGGTAGTAAGTACAAAGTAACCTTCAAGACAGTTCCTTTTGATA 
C T AT T T C AAC AGGT AT T GAT GC AGGGAAAT T T GATT TAT C AGC T AAT GAT T T T T C AT AT AAT AAAGAAAG AGC AG AAAAAT AT CT C 
TTCTCAGATCCTATATCCCGTTCAAATTATGCCGTAGTAGGGAAGAAGGGGAGCCATTACAAATCATTAAGTGACCTCTCTGGAAA 
ATCAACAGAAGTTTTATCTGGCGTTAACTATGCACAGGTTCTAGAAAATTGGAATAAAAATCATCCTAATAAAAAACCAATAAAAA 
TCAAATATGTTTCTGGGACAACTGGTGTTACTAGCAGATTAAAAAATATTGAGAGTGGAAAAATTGACTTTATCCTATATGATGCC 
AT T T CAT C T G ACT AT AT T GT AAAAG AT C AAT CAT T AAACT T AAG CGT T T CT CCT TT GAAAGGT AAAAT T GGT AAT AAT AAGG AT G G 
AT T AG AAT ACCT CCTT TT ACCAAAAGAT AAAAAAGGTAAAACT CT AC AG AAAT TT ATAAAT AAGCGT AT T AAAGT T TTGAAAGAAG 
AT G GT ACT T T G GC AC GT T T AAGT AAAC AAT AT T T CG GT GG AG AT T AC GT T T C AAAC AT T GAT AAA 

>SEQ ID NO 2450: 8JL169NT frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDVVKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2451:8_18RS21 frame: 1 

VS VQASEKVELKVATDS DTAPFT YXKDGKFKGYDVDVVKAVFKGSKYKVT FKTVPFDT I S 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAVVGKKGSHYKSLSDLSGKSTEVL 
SGWYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2452 : 8_ 2 603 frame: 2 

FKGYDVDWKAVFKGSKYKVTFKTVPFDTISTGIDAGKFDLSANDFSYNKERAEKYLFSD 
PISRSNYAWGKKGSHYKSLSDLSGKSTEVLSGVNYAQVLENWNKNHPNKKPIKIKYVSG 
TTGVTSRLKNIESGKIDFILYDAISSDYIVKDQSLNLSVSPLKGKIGNNKDGLEYLLLPK 
DKK 

>SEQ ID NO 2453:8_090 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGWYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2454:8_A909 frame: 1 

VS VQAS EKVE LKVATDSDTAP FT YQKDGKFKGYDVDVVECAVFKGSKYKVT FKTVPFDT IS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVN YAQVLENWNKNHXNKKPXKXKYVSGTTGVT SRLKN IE S GKI D FI LYDAI S S D YI VK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKR 

>SEQ ID NO 2455: 8_CJB110 frame: 1 

VS VQAS EKVE LKVAT DSDTAP FT YQKDGKFKG YD VDVVKAVFKGSKYKVT FKTVPFDT IS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAVVGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVT SRLKN IE SGKIDFI LYDAI SSDY I VK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2456: 8_COHl frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVT SRLKN IESGKIDFI LYDAI SSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2457:8_H36B frame: 1 

VS VQAS EKVE LKVAT DSDTAP FT YQKDGKFKG YD VDVVKAVFKGSKYKVT FKTVPFDT IS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
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SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKENGTLARLSKQY 
FGGDYVSN I DK 

>SEQ ID NO 2458:8_JM9130013 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDVVKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKI DFILYDAI S S DYI VK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRNKVLKENG 

>SEQ ID NO 2459:8_M732 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDI^SGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

>SEQ ID NO 2460:8_M781 frame: 1 

VSVQASEKVELKVATDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTVPFDTIS 
TGIDAGKFDLSANDFSYNKERAEKYLFSDPISRSNYAWGKKGSHYKSLSDLSGKSTEVL 
SGVNYAQVLENWNKNHPNKKPIKIKYVSGTTGVT SRLKN IE SGKI DFI LYDAI S S DYI VK 
DQSLNLSVSPLKGKIGNNKDGLEYLLLPKDKKGKTLQKFINKRIKVLKEDGTLARLSKQY 
FGGDYVSNIDK 

SEQ ID NO. 2501: SAG0368 FROM THE 090 GBS TYPE la STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAAGAAACAAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
TAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAAT 
GG AG AAC AAG C AC T T GT T T AT T C T CG TAT G C G CT AT GAT GAT C C AG AG GG AG AT T AT GGG CGT C AAAAAAG AC AAC GT G AAG T AAT 
TCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAA 
C T AAT AT T GAG AT AT CAT C AAAAAC GAT T C C T AAT T T GT T AG C T T AT AAAG AT T CAT T G G AAC AT AT T AAAT C T TAT C AGT T G AAG 
GGTGAAGACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAA 
AG AACT G G AT AAAAAG CGT AGT AAAAC T C T GAAG AC AAG C G C GAT T C TAT AT GAAGAT TAG T AT GGT AC TAG T G CT AGT AAT GAT T 
CTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACT 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2502: SAG0368 FROM THE 1169NT1 GBS TYPE V STRAIN 

TATAATTTTTCGACTAATGAATTGTCTAAGACTTTTAAAGATTTTAAGCTAGCTAAATCAAAAAGTCATGCTATTGAAGAAACAAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTTGGTCAGGAAATAGCGATTCTATGATC 
TTAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAA 
TAATGGACAGACTGGCGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACT 
TATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTA 
ACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAA 
TGGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAA 
TTCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAA 
ACTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAA 
AGG T GAAG AC G CT AC T T T AT C AG AT G GT G G CT CT T AT C AAAT T T T AAC T AAG AAAC AT C T AC T T G C AGT T C AAAAT AG AAT T AAG A 
AAGAACTAGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGAT 
TCTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTAC 
T TAT AGT T C T GAG ACT AAT C AAAC AAC T CAT C AAAGT T ACT AT AAT AGT AG C ACT C C T G C T AAT AACT AT AG C AGT AAC ACT AAC A 
CAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAATGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2503 SAG0368 FROM THE 18RS21 GBS TYPE II STRAIN 

TAT AAT T T T T C G AC T AAT G AAT T G T CT AAG AC T T T T AAAG AT T T T AAG CT AG CT AAAT C AAAAAGT CAT G C T AT T GAAG AAAC AAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
T AGT C AC TAT AAAT C C T AAAAC T AAT AAAAC AAC GAT G AC AAG CT T AG AAC GT G AC G T AT T G AT T AAAT T G AGT GGT C C C AAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAAT 
GGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
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TCAAAAAGT CCT T AAAAAAAT ATT GGCGTTAAATAGT ATTAGTT C ATACAAAAAAATT CTTT CCGCAGTAAGT AATAACAT GCAAA 
CT AAT AT T G AGAT AT CAT C AAAAAC GAT T C C T AAT T T GT T AG CT T AT AAAG AT T CAT T GGAAC AT AT T AAAT CT T AT C AGT T G AAG 
GGT GAAGACGCT ACTTT AT CAGATGGTGGCT CTT AT CAAATT T T AACT AAGAAACAT CT ACTTGC AGTT CAAAATAGAATTAAGAA 
AGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATT 
CTT CT AC T TAT T CAT CAAC AC AAG AG AAT AAT T AT AAT AC AAC AC CT T AT T C AGAAGC ACC AC C AAGT T AC AGT G GT AAT ACT AC T 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2504: SAG0368 FROM THE 2603 V/R GBS TYPE V STRAIN 

TAT AAT TT T T CGACTAAT GAATTGTCT AAGACTTTTAAAGATT T TAAGCTAGCT AAAT CAAAAAGT CAT GCTATT GAAGAAACAAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
TAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CT AAT AAAT T T GACT T T C C AAT AT C AAT T G C T G C C AATGAACC AG AGT AC AAG GCTGTTGTT G AAC C AGG GAC AC AT AAAAT AAAT 
GG AG AACAAGC ACTT GTTT ATT CT CGTATGCGCTATGATG AT CC AG AGGG AG ATT ATGGGCGTCAAAAAAGAC AAC GTG AAGT AAT 
T CAAAAAGT C CTT AAAAAAAT ATTGGCGTT AAAT AGTATT AGTT CAT ACAAAAAAATT CTTT CCGCAGT AAGTAAT AAC ATGCAAA 
CT AAT AT T G AGAT AT CAT C AAAAAC GAT T C C T AATT T GT T AG CTT AT AAAG AT T CAT T GGAAC AT AT T AAAT CT T AT C AGT T G AAG 
GGT GAAGACGCT ACTTTAT CAGATGGTGGCT CTTAT CAAATT TTAACTAAGAAAC AT CT ACTTGC AGTT CAAAATAGAATTAAGAA 
AGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATT 
CT T CT ACTT ATT CAT CAACACAAGAGAATAATT AT AAT ACAAC ACCT TATT CAGAAGC ACC ACCAAGTTACAGT GGTAAT ACTACT 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2505: SAG0368 FROM THE A909 GBS TYPE la STRAIN 

TAT AAT T TT T CGACTAATGAAT TGT CTAAGACTT TT AAAG ATTTTAAGCTAGCTAAAT CAAAAAGT CATGCTATTGAAGAAAC AAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
T AGT C AC TAT AAAT C C T AAAAC T AAT AAAAC AAC GAT GAC AAG C T TAG AAC GT GAC GT AT T GAT T AAAT T GAGT G GT C C C AAAAAT 
AAT GGACAGACT GGAG TAG AAGCAAAGC T AAAT G C AG C C TAT GCTTCTGGT GGT G C GG AAAT G G C ATT GAT G ACT GT T C AAG ACT T 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CT AAT AAAT T T G ACT T T C C AAT AT C AAT T G CT G C C AAT G AAC C AGAGT AC AAG GCTGTTGTT GAAC C AGG GAC AC AT AAAAT AAAT 
GGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
TCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAA 
CT AAT ATTGAGAT AT C AT CAAAAACGATT CCTAAT TTGT T AGCT T ATAAAGATT CATTGGAACATAT T AAAT CTT ATCAGTT GAAG 
GGT G AAG AC G C TACT T TAT C AG AT GGTGGCTCT TAT C AAAT T T T AAC T AAGAAACAT CT AC T T G C AG T T C AAAAT AGAAT T AAG AA 
AGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATT 
CTTCTACTTATTCATCAACACAAGAGAATAATTATAATACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACT 
TATAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACAC 
AGGT C AG G C T GAT T C AAG T G G AAGT GT C AAT AAT CAT AAC GGG G CT G C AAC G C C T AAT C C A 

SEQ ID NO. 2506: SAG0368 FROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

TAT AAT T T T T CG AC T AAT G AAT TGT CT AAG ACT T T T AAAG AT T T T AAG CT AG CT AAAT CAAAAAGT CAT G CT AT T GAAGAAACAAA 
GCCGTTTT C AAT ACT AT T AAT G G G G GT GG AC AC AG GT T C AG AG CAT C G AAAAT C T AAGT GGT C AGG AAAT AGCG AT T C TAT GAT CT 
TAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
ATTAGATATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTAGTCAATGCTGTTGGTGGTATAACAGTAA 
CTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAAT 
GG AG AAC AAGC AC T T GT T TAT T CT C GT AT G CG CT AT GAT GAT C C AG AG GGAG AT TAT GGG C GT C AAAAAAG AC AAC GT G AAGT AAT 
TCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAA 
CT AAT AT TG AG AT AT CAT CAAAAACGATT CCT AATT TGT TAGCTT AT AAAG AT TC ATT GGAAC AT ATT AAAT CTT AT CAGTTGAAG 
GGT GAAG ACG C T ACT T T AT C AG AT GGTGGCTCT TAT C AAAT T T T AAC T AAG AAAC AT C T ACT T GC AGT T C AAAAT AGAAT T AAG AA 
AGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATT 
CT T CT ACTT ATT CATC AAC ACAAG AG AAT AAT TAT AAT AC AAC ACCT TATT CAGAAGC ACC ACCAAGTTACAGT GGT AAT ACT ACT 
TATTAGTTCTGAGACTAATCAAACAACTCATCAAAATTACTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACA 
CAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2507: SAG0368 FROM THE COHl GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GATTTTAAGCTAGATAAATCAAAAAGTCATGCTATTGAAGAAACAAAGCCGTTTTCAATACTATTAATGGGTGTGGACACAGGTTC 
AGAGC AT CGAAAATCT AAGTGGT CAGGAAATAGCGAT TCTATGAT CT T AGT C ACT AT AAAT CCT AAAACT AAT AAAAC AACGATGA 
CAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAATAATGGACAGACTGGCGTAGAAGCAAAGCTAAATGCAGCC 
TATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTTATTAGATATTAATGTTGATTACTTTATGCAAATTAATAT 
GCAAGGATTAGTTGATTTGGTCAATGCTGTTGGTGGTATAACAGTAACTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATG 
AACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAATGGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGAT 
GATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAAAGTCCTTAAAAAAATATTGGCGTTAAATAGTAT 
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T AGTT CATACAAAAAAATT CTTT CCGCAGT AAGT AATAACAT GCAAACTAAT ATT GAGAT AT C AT C AAAAACGATT CCT AATT TGT 
T AG CT T AT AAAGAT T CAT T GG AAC AT AT T AAAT CT T AT C AGT T GAAGG G T GAAG AC G C T AC T C TAT C AG AT GGTGGCTCT TAT C AA 
AT T T T AACT AAGAAAC AT C T ACT T G C AGT T CAAAAT AG AAT T AAGAAAG AG CT GG AT AAAAAGC GT AGT AAAAC T C T G AAG AC AAG 
CGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTACTTATTCATCAACACAAGAGAATTATTATTATA 
CAACACCCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAGTTA 
CTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTTAATAATTATA 
ACGGGGCTG C AAC G C C T AAT C C AAA C AC AG GAACG C AAC C AGT AC C AGGT C AAACT AAT CCA 

SEQ ID NO. 2508: SAG0368 FROM THE H36b GBS TYPE lb STRAIN 

T AT AATTTTT CGACT AAT GAAT T GT CT AAG AC TTTT AAAGAT TTTAAGCTAGCT AAAT CAAAAAGTC AT GCT AT T GAAGAAACAAA 
GCCGTTTTCAATACTATTAATGGGGGTGGACACAGGTTCAGAGCATCGAAAATCTAAGTGGTCAGGAAATAGCGATTCTATGATCT 
TAGTCACTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
AT T AGAT AT T AAT GT T GAT TAG T T TAT GC AAAT T AAT AT G C AAG G AT T AGT T GAT T T AGT C AAT G C T GT T GGT G GT AT AAC AGT AA 
CTAATAAATTTGACTTTCCAATATCAATTGCTGCCAATGAACCAGAGTACAAGGCTGTTGTTGAACCAGGGACACATAAAATAAAT 
GGAGAACAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
T C AAAAAGT C C T T AAAAAAAT AT T G G C GT T AAAT AGT A 

SEQ ID NO. 2509: SAG0368 FROM THE 

T T AGT T CAT AC AAAAAAAT T CT T T CCGC AGT AAGT AAT AAC AT GC AAACT AAT AT T GAG AT AT C AT C AAAAACG AT T CCT AAT T T G 
TTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAGGGTGAAGACGCTACTTTATCAGATGGTGGCTCTTATCA 
AATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAAAGAACTGGATAAAAAGCGTAGTAAAACTCTGAAGACAA 
GC GC G AT T C TAT AT GAAG AT TACT AT G GT AC T AC T GC T AGT AAT GAT T CT T CT ACT TAT T CAT C AAC AC AAGAG AAT AAT TAT AAT 
ACAACACCTTATTCAGAAGCACCACCAAGTTACAGTGGTAATACTACTTATAGTTCTGAGACTAATCAAACAACTCATCAAAATTA 
CTATAATAGTAGCACTCCTGCTAGTAACTATAGCAGTAACACTAACACAGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATA 
ACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2510: SAGO 3 68 FROM THE JM9130013 GBS TYPE VIII STRAIN (REVERSE COMPLEMENT ) 

TAT AATTTTT CGACT AAT GAAT T GTCT AAG ACT TTT AAAGAT T TT AAGCT AGCT AAAT C AAAAAGT CAT GCT ATT GAAGAAACAAA 
GCCGTTTT C AAT AC TAT T AAT GGG G GT GG AC AC AGGT T C AGAG CAT C G AAAAT CT AAGT GGT C AG G AAAT AG CG AT T C T AT GAT C T 
T AGT C ACT AT AAAT CCT AAAACT AAT AAAAC AACG AT G AC AAG CT T AG AAC GT G AC GT AT T GAT T AAAT T G AGT G GT C C C AAAAAT 
AATGGACAGACTGGAGTAGAAGCAAAGCTAAATGCAGCCTATGCTTCTGGTGGTGCGGAAATGGCATTGATGACTGTTCAAGACTT 
AT TAG AT AT T AAT GT T GAT TACT T T AT G C AAAT T AAT AT G C AAGG AT T AGT T GAT T T AGT C AAT GCTGTTGG T GGT AT AAC AG T AA 
C T AAT AAAT T T G ACT T T C C AAT AT C AAT T G C T G C C AAT GAAC C AG AGT AC AAGG C T G T T GT T GAAC C AG G GAC AC AT AAAAT AAAT 
GGAGAACAAgCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAAT 
T C AAAAAGT CCT T AAAAAAAT AT T GG CGT T AAAT AGT AT T AGT T CAT AC AAAAAAAT TCTTTCCG C AGT AAG T AAT AAC AT G C AAA 
CTAATATTGAGATATCATCAAAAACGATTCCTAATTTGTTAGCTTATAAAGATTCATTGGAACATATTAAATCTTATCAGTTGAAG 
GGTGAAGACGCTACTTTATCAGATGGTGGCTCTTATCAAATTTTAACTAAGAAACATCTACTTGCAGTTCAAAATAGAATTAAGAA 
AG AACT GGAT AAAAAGC GT AGT AAAAC T CT GAAG AC AAG CG C GAT T C TAT AT GAAG AT TAG TAT G GT AC T AC T G CT AGT AAT GAT T 
CT T CT ACT TAT T CAT C AAC AC AAG AG AAT AAT TAT AAT AC AAC AC CT T AT T C AG AAG C AC C ACC AAGT T AC AGT GG T AAT ACT ACT 
TAT AGT T CT G AG AC T AAT C AAAC AACT CAT CAAAAT TACT AT AAT AG TAG C AC T C C T GCT AGT AACT AT AG C AGT AAC ACT AAC AC 
AGGTCAGGCTGATTCAAGTGGAAGTGTCAATAATCATAACGGGGCTGCAACGCCTAATCCA 

SEQ ID NO. 2511: SAG0368 FROM THE M781 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

T T C AAT ACT AT T AAT GG GT GT GG AC AC AGGT T C AGAG CAT C G AAAAT C T AAGT GG T C AG G AAAT AGC GAT T C TAT GAT CT T AGT C A 
CTATAAATCCTAAAACTAATAAAACAACGATGACAAGCTTAGAACGTGACGTATTGATTAAATTGAGTGGTCCCAAAAATAATGGA 
C AG ACT GG C G TAG AAG C AAAG CT AAAT GC AGC CT AT GCTTCTGGTGGTGCG G AAAT G GC AT T GAT G ACT GTT C AAG ACTT ATT AGA 
TATTAATGTTGATTACTTTATGCAAATTAATATGCAAGGATTAGTTGATTTGGTCAATGCTGTTGGTGGTATAACAGTAACTAATA 
AATTT G ACTTT C C AAT AT C AAT T GCT GCC AAT G AACC AGAGT AC AAGGCT GT T GT T GAAC C AGGGAC AC AT AAAAT AAAT GGAG AA 
CAAGCACTTGTTTATTCTCGTATGCGCTATGATGATCCAGAGGGAGATTATGGGCGTCAAAAAAGACAACGTGAAGTAATTCAAAA 
AGTCCTT AAAAAAAT ATTGGCGTTAAAT AGT ATTAGTTCATACAAAAAAATTCTTTCCGCAGTAAGTAATAACATGCAAACTAATA 
T T GAG AT AT CAT C AAAAACGAT T C C T AAT T T G T T AG C T TAT AAAG AT T CAT T G GAAC AT AT T AAAT C T T AT C AGT T GAAG GGT GAA 
GAC GC TACT C T AT C AG AT GG T GG C T C T TAT C AAAT T T T AAC T AAG AAAC AT C T ACT T G C AG T T CAAAAT AG AAT T AAG AAAG AG C T 
GGATAAAAAGCGTAGTAAAACTCTGAAGACAAGCGCGATTCTATATGAAGATTACTATGGTACTACTGCTAGTAATGATTCTTCTA 
CT TAT T CAT C AAC AC AAG AG AAT AAT TAT AAT AC AAC AC CT TAT T C AG AAG C AC C AC C AAG T TAG AG T GGT AAT AC T AC T TAT AG T 
T C T GAG AC T AAT C AAAC AAC T CAT C AAAGT T AC T AT AAT AGT AG C AC T C C T G C T AGT AAC TAT AG C AGT AAC AC T AAC AC AG GT C A 
GGCTGATTCAAGTGGAAGTGTTAATAATTATAACGGGGCTGCAACGCCTAATCCAAACACAGGAACGCAACCAGTACCAGGTCAAA 
CTAATCCA 

>SEQ ID NO 2550: 54_090 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIP 
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NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2551:54_1169NT frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKLVRK. RFYDLSH 
YKS .N. . NNDDKLRT . RID . IEWSQK . WTDWRRSKAKCSLCFWWCGNGIDDCSRLIRY . C 
. LLYAN . YARIS . FSQCCWWYNSN . . I . LSNINCCQ . TRVQGCC . TRDT . NKWRTSTCLF 
SYAL. . SRGRLWASKKTT . SNSKSP . KNIGVK. Y. FIQKNSFRSK. . HAN . Y . DIIKNDS 
. FVSL . RFIGTY . ILSVER . RRYFIRWWLLSNFN . ETSTCSSK . N . ERTR . KA . . NSEDK 
RDS I . RLLWYYC . . . FFYLFINTRE . L . YNTLFRSTTKLQW . YYL . F. D . SNNSSKLL . . 
.HSC. . L.Q.H.HRSG. FKWKCQ . S . WGCNA . S 

>SEQ ID NO 2552:54_18RS21 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLWAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNS I S S YKKILSAVSNNMQTN IE I SSKTI P 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2553 :54_2 603 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
IN PKTNKTTMT S LERDVL IKLS GPKNNGQTGVE AKLNAAYAS GG AEMALMT VQD LL D IN V 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNS I S S YKKILSAVSNNMQTN IE I S SKTI P 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2554: 54_A909 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKILALNSISSYKKILSAVSNNMQTNIEISSKTIP 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2555:54_CJB110 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNS I S S YKKILSAVSNNMQTN IE I S SKT I P 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTY. F . D . SNNSSKLL . . 

>SEQ ID NO 2556:54_COHl frame: 1 

DFKLDKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVTINPKTNKTTMTSL 
ERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINVDYFMQINMQGLVD 
LVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYSRMRYDDPEGDYGR 
QKRQREVIQKVLKKI LALNS I S S YKKILSAVSNNMQTN IEI S SKT I PNLLAYKDSLEHIK 
SYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTSAILYEDYYGTTAS 
NDSSTYSSTQENYYYTTPLFRSTTKLQW . YYL . F . D . SNNSSKLL . . .HSC. .L.Q.H.H 
RSG . FKWKC . . L. RGCNA . SKHRNATSTRSN . S 

>SEQ ID NO 2557:54_H36B frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
IN PKTNKTTMT S LERDVL IKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAVVEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQREVIQKVLKKI LALNS I SSYKKILSAVSNNMQTNIE IS SKTI P 
N LL AYKD S LE H I KS Y Q LKG E DAT L S D GG S YQ I LT KKHLL AVQN R I KKE L DKKRS KT LKT S 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 



117 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 
>SEQ ID NO 2558:54_JM9130013 frame: 1 

YNFSTNELSKTFKDFKLAKSKSHAIEETKPFSILLMGVDTGSEHRKSKWSGNSDSMILVT 
INPKTNKTTMTSLERDVLIKLSGPKNNGQTGVEAKLNAAYASGGAEMALMTVQDLLDINV 
DYFMQINMQGLVDLVNAVGGITVTNKFDFPISIAANEPEYKAWEPGTHKINGEQALVYS 
RMRYDDPEGDYGRQKRQRE VI QKVLKKI LALNS I S S YKKILS AVSNNMQTN IE I S SKTI P 
NLLAYKDSLEHIKSYQLKGEDATLSDGGSYQILTKKHLLAVQNRIKKELDKKRSKTLKTS 
AILYEDYYGTTASNDSSTYSSTQENNYNTTPYSEAPPSYSGNTTYSSETNQTTHQNYYNS 
STPASNYSSNTNTGQADSSGSVNNHNGAATPNP 

>SEQ ID NO 2559:54_M781 frame: 2 

SILLMGVDTGSEHRKSKWSGNSDSMILVT INPKTNKTTMTSLERDVLIKLSGPKNNGQTG 
VEAKLNAAYASGGAEMALMTVQDLLDINVDYFMQINMQGLVDLVNAVGGITVTNKFDFPI 
SIAANEPEYKAWEPGTHKINGEQALVYSRMRYDDPEGDYGRQKRQREVIQKVLKKILAL 
NSISSYKKILSAVSNNMQTNIEISSKTIPNLLAYKDSLEHIKSYQLKGEDATLSDGGSYQ 
ILTKKHLLAVQNRIKKELDKKRSKTLKTSAILYEDYYGTTASNDSSTYSSTQENNYNTTP 
YSEAPPSYSGNTTYSSETNQTTHQSYYNSSTPASNYSSNTNTGQADSSGSVNNYNGAATP 
NPNTGTQPVPGQTNP 

SEQ ID NO. 2601: SAG0503 FROM THE 090 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

GGGCACAAGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCC 
TAACAAAGAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGT 
TTTGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAG 
T C AAC AAAT T T T AAAAC G TAT GAG GAC AGAT C C T C AAAT CG AAAAAGAT T T AG AG AAAG C T GAT T TAT T G AC G C T AAC TGTTGGTG 
GTAATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAA 
CGTTTGAAAGAAATACTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCT 
AAAC T T T C C AC AAT T AAC T AAAAT G C AAAC C GT T AT T GAT AAT T GG AAT AAAGC T AC AAAAGAAG T AG T T GAT G CT T C AG AAAAT G 
TTTATTTTGTCCCAATTAATGACCGCCTTT AT AAGGGAAT AAAT GGT AAAG AGGGT ATT AC AGAGT CATC AAAT AGTCAGGCAAGT 
ATCACTAATGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAA 
AAT AAAT GAAAC AAG AAAAAAC T GG C C G AACC C AG CTTTCTTG T ACAAAG 

SEQ ID NO. 2602: SAG0503 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

T T T GT AC AAAAAAG C AGG C T CT AT TTTTTCCTT GAT CAT T C C AAAAT C AAAT C C T AAAT T AAC AAAAAAAG AC T T C C T AAC AAAG A 
AAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAiiGGTGGTTTTGTTCCA 
CT G C T AT C AG AAT C ACT C CAT AAT C GAT AC T CT T AC C AAGT G AC T T C T G T T AAT TAT GGT G T GT CT GG G AAT AC T AGT C AAC AAAT 
T T T AAAACG T AT G ACG AC AG AT C C T C AAAT C G AAAAAGAT T TAG AG AAAG C T GAT T T AT T GAC G CT AACT GT T G G T G GT AAT GAT G 
T C T T G G C T GT T AT T C GT AAAG AG C T C AGT CAT T TAT C AC T AAAT T C C T T T GAG AAAC C AG C AG AAGC AT AT AAGG AAC G T T T G AAA 
GAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTCC 
ACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTTG 
TCCCAATTAATGACCGCCTTTAT AAGGGAAT AAATGGTAAAGAGGGTATTATAGAGTCATCAAAT AGTCAGGCAAGT ATCACT AAT 
GAT G C T CT C T TT ACT GG AG AC C AT T T T CAT C C C AAT AAT AT T G G CT AT C AAAT CAT G T CT AAC GC C G T T AT G G AGAAAAT AAAT GA 
AACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAAGTGGTCC 

SEQ ID NO. 2603: SAG0503 FROM THE 18RS21 GBS TYPE II STRAIN (REVERSE COMPLEMENT) 

GTT T GT AC AAAAAAGCAGG CT CTATTTTTT CCT T G AT C ATT CC AAAAT C AAAT CCT AAAT T AAC AAAAAAAG ACTTCCT AAC AAAG 
AAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTCC 
AC T G C TAT C AG AAT C AC T C CAT AAT C GAT AC T C T T AC C AAGT G ACT T C T GT T AAT TAT GGTGTGTCT GGG AAT AC T AG T C AAC AAA 
T T T T AAAACGT AT G ACG AC AG AT CCT C AAAT CG AAAAAGAT T T AG AG AAAG CT GAT T T AT T G ACG CT AAC TGTTGGTGGT AAT GAT 
GT C T T GGCT GT TAT T C GT AAAG AG C T C AGT CAT T T AT C ACT AAAT T C CT T T GAG AAAC C AG C AG AAG C AT AT AAGG AACGT T T G AA 
AGAAAT CCT TGC AAAAGC AAGACAAGAT AAT CCT AAATTGCCT ATT T ATGT TTT AGGCAT T T AT AAT CCT T TT T ACCT AAACT TT C 
CACAATTAACT AAAAT GC AAAC CGTT AT TGATAATT GGAATAAAGCTACAAAAG AAGT AGTTGATGCTT CAGAAAATGTTT ATTT T 
GT C C C AATT AAT G ACC GCCT T T AT AAGGGAAT AAAT GGT AAAG AGGGT AT T AC AG AGT CAT C AAAT AGT C AGGC AAGT AT C ACT AA 
TGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATG 
AAAC AAG AAAAAACT G G C C G AAC C C AGCT T T CTT GT AC AA 

SEQ ID NO. 2604: SAGO 503 FROM THE COH1 GBS TYPE III STRAIN (REVERSE COMPLEMENT) 

GGACAAGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTA 
AC AAAG AAAGT T AT C C C ACT T AACT AT GT T G CT CTT G GAG AT T CT CT GAC C G AAG GT GT G GGG GAT AC AAC CT CT C AAG GT G GT TT 
TGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTC 
AACAAATTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGT 
AATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACG 
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T T T G AAAG AAAT T C T T G C AAAAGC AAGACAAGAT AAT CCT AAAT T GCC T AT T TAT GT T T T AG G CAT T TAT AAT CC T T T T TAG C T AA 
ACTTTCCACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTT 
TAT T T T GT C C C AAT T AAT G AC C GC CT T T AT AAG G GAAT AAAT G GT AAAGAGGGT AT T AC AGAGT C AT C AAAT AGT C AGG C AAGT AT 
CACTAATGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAA 
T AAAT GAAAC AAG AAAAAAC T G GC CGAAC C C AG CT T T CT T GT AC AAA 

SEQ ID NO. 2605: SAG0503 PROM THE CJB110 GBS NONTYPEABLE STRAIN (REVERSE COMPLEMENT) 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTAACAAAG 
AAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTCCC 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTCAACAAA 
TTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGAT 
GTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAA 
AG AAAT AC T T G C AAAAGC AAG AC AAG AT AAT C C T AAAT T G C CT AT T TAT GT T T T AGG CAT T TAT AAT C CT T T T T AC C T AAACT T T C 
CACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTT 
GT CC C AAT TAATGACCGCCTTT AT AAGGGAAT AAAT GGT AAAGAGGGT ATT AC AG AGTC AT C AAAT AGT CAGGC AAGT AT C ACT AA 
TGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATG 
AAAC AAG AAAAAAC T GGC CGAAC C C AG CT T T CT T GT AC AA 

SEQ ID NO. 2606: SAG0503 FROM THE 1169NT1 GBS TYPE V STRAIN (REVERSE COMPLEMENT) 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTAACAAAG 
AAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGGGATACAACCTCTCAAGGTGGTTTTGTCCC 
ACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTCAACAAA 
TTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGAT 
GTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAA 
AGAAATTCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTC 
CACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTT 
G T C C C AAT T AAT G AC CG C C T T TAT AAG GGAAT AAAT GGT AAAGAGGGT AT T AC AG AGT CAT C AAAT AGT CAGGC AAGT AT C AC T AA 
TGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATG 
AAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

SEQ ID NO. 2607: SAG0503 FROM THE JM9130013 GBS TYPE VIII STRAIN 
(REVERSE COMPLEMENT) 

GTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTAACAAAG 
AAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTCC 
ACT G C TAT C AG AAT C ACT C CAT AAT C GAT AC T C T T AC C AAGT G AC T T CT GT T AAT TAT GGT GTGTCTGG GAAT AC T AGT C AAC AAA 
TTTTAAAACGTATGACGACAGATCCTCAAATCGAAAAAGATTTAGAGAAAGCTGATTTATTGACGCTAACTGTTGGTGGTAATGAT 
GTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGAA 
AGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTTC 
CACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTTT 
GT C C C AAT T AAT G AC C G C CT T TAT AAG GGAAT AAAT G GT AAAG AGG GT AT T AC AGAGT CAT C AAAT AGT CAGGC AAGT AT C AC T AA 
TGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAATG 
AAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

SEQ ID NO. 2608: SAG0503 FROM THE 2603 V/R GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTAACAAA 
GAAAGTTATCCCACTTAACTATGTTGCTCTTGGAGATTCTCTGACCGAAGGTGTGGGCGATACAACCTCTCAAGGTGGTTTTGTTC 
CACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTCAACAA 
AT T T T AAAAC GT AT G ACG AC AGAT CCT C AAAT C G AAAAAG AT T TAG AG AAAG CT GAT T T AT T G AC G C T AAC T GT T GGT G GT AAT G A 
TGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACGTTTGA 
AAGAAATCCTTGCAAAAGCAAGACAAGATAATCCTAAATTGCCTATTTATGTTTTAGGCATTTATAATCCTTTTTACCTAAACTTT 
CCACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTTTATTT 
TGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTATCACTA 
ATGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAATAAAT 
GAAAC AAG AAAAAACTGGCCGAACCCAGCTTTCTTGT AC AAAGTGG 

SEQ ID NO. 2609: SAG0503 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GGACAAGTTTGTACAAAAAAGCAGGCTCTATTTTTTCCTTGATCATTCCAAAATCAAATCCTAAATTAACAAAAAAAGACTTCCTA 
AC AAAG AAAGT TAT C C C AC T T AAC TAT GTTGCTCT T GG AG AT T CT C T G AC C G AAGGT GT GG G G GAT AC AAC CT CT C AAGGT GGT TT 
TGTCCCACTGCTATCAGAATCACTCCATAATCGATACTCTTACCAAGTGACTTCTGTTAATTATGGTGTGTCTGGGAATACTAGTC 
AAC AAAT T T T AAAAC GT AT G AC G AC AG AT CCT C AAAT CG AAAAAG AT T T AGAG AAAGC T GAT T TAT T G AC G C T AAC TGTTGGTGGT 
AATGATGTCTTGGCTGTTATTCGTAAAGAGCTCAGTCATTTATCACTAAATTCCTTTGAGAAACCAGCAGAAGCATATAAGGAACG 
TTT GAAAGAAATT CT TGC AAAAGC AAG AC AAG AT AAT CCT AAAT TGCCT ATT TATGTTTTAGGC ATT TAT AAT CCTTTTTACCTAA 
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ACTTTCCACAATTAACTAAAATGCAAACCGTTATTGATAATTGGAATAAAGCTACAAAAGAAGTAGTTGATGCTTCAGAAAATGTT 
TATTTTGTCCCAATTAATGACCGCCTTTATAAGGGAATAAATGGTAAAGAGGGTATTACAGAGTCATCAAATAGTCAGGCAAGTAT 
CACTAATGATGCTCTCTTTACTGGAGACCATTTTCATCCCAATAATATTGGCTATCAAATCATGTCTAACGCCGTTATGGAGAAAA 
TAAATGAAACAAGAAAAAACTGGCCGAACCCAGCTTTCTTGTACAAA 

>SEQ ID NO 2650:103_090 frame: 2 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVP 

LLSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLA 

VIRKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKM 

QTVIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDH 

FHPNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2651:103_H36B frame: 2 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEWDASENVYFVPINDRLYKGINGKEGIIESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2652:103_18RS21 frame: 3 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEVVDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2653:103_COH1 frame: 3 

IFSLIIPKSNPKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPL 

LSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAV 

IRKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQ 

TVIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHF 

HPNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2654:103_CJB110 frame: 3 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2655:103_1169NT frame: 3 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2656:103_JM9130013 frame: 3 

I FS LI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTT SQGGFVPLLS 

ESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVIR 

KELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQTV 

IDNWNKATKEVVDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFHP 

NNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2657 :103_2 603 frame: 1 

I FSLI I PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPLL 

SESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAVI 

RKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQT 

VIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHFH 

PNNIGYQIMSNAVMEKINETRKNWP 

>SEQ ID NO 2658:103_M781 frame: 3 

IFSLII PKSNPKLTKKDFLTKKVI PLNYVALGDSLTEGVGDTTSQGGFVPL 
LSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLTVGGNDVLAV 
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IRKELSHLSLNSFEKPAEAYKERLKEILAKARQDNPKLPIYVLGIYNPFYLNFPQLTKMQ 
TVIDNWNKATKEWDASENVYFVPINDRLYKGINGKEGITESSNSQASITNDALFTGDHF 
HPNNIGYQIMSNAVMEKINETRKNWP 

SEQ ID NO. 2701: SAG1473 FROM THE 1169NT1 6BS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

GAT AC AAG T GAT AAG AAT AC T G AC AC GAGT GT C GT GAG T AC G AC C T T AT C T G AGGAGAAAAGAT C AG AT G AAC TAG AC C AGT CT AG 
TACTGGTTCTTCTTCTGAAAATGAATCGAGTTCATCAAGTGAACCAGAAACAAATCCGTCAACTAATCCACCTACAACAGAACCAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGT AT T AAT T T C AGAAGAT AGT AT T AAGAAT T T TAG T AAAG C AAGT AG T GAT C AAG AAG AAGT GG AT C GC G AT G AAT CAT CAT C 
TTCAAAAGCAAGTGATGGGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2702: SAG1473 FROM THE 18RS21 GBS TYPE II STRAIN 

GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATCAGATGAACTAGACCAGTCTAG 
T ACT GGT TCTTCTTCT G AAAAT GAAT CGAGT T CAT C AAGT G AAC C AG AAAC AAAT C CGT C AAC T AAT C C AC C T AC AAC AGAAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGT ATT AATTTCAGAAGATAGTATTAAGAATTTT AGT AAAGC AAGT AGT GAT CAAGAAGAAGTGGATCGCGATGAAT CAT CATC 
T T C AAAAGC AAAT GAT GG GAAAAAAGG C C AC AGT AAGC CT AAAAAG GAA 

SEQ ID NO. 2703: SAG1473 FROM THE 2603 V/R GBS TYPE V STRAIN 

GAT AC AAGT GAT AAG AAT ACT GAC ACG AGT GT C GT GAC T AC GAC CT T AT CT G AGGAGAAAAGAT C AG AT G AACT AGAC C AGT C TAG 
TACT GGT T CTTCTTCT G AAAAT GAAT C G AGT T CAT C AAGT G AAC C AG AAAC AAAT CCGT C AAC T AAT C C AC C T AC AAC AG AAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGAACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGT AT T AAT T T C AGAAGAT AGT AT T AAG AAT T T T AGT AAAG C AAG T AGT GAT C AAGAAG AAGT GG AT C GC G AT GAAT CAT CAT C 
T T C AAAAG C AAAT G AT GG GAAAAAAGGC C AC AG T AAG C CT AAAAAG GAA 

SEQ ID NO. 2704: SAG1473 FROM THE 090 GBS TYPE la STRAIN 

GACCAGTCT AGT ACTGGTT CTT CTT CTGAAAAT GAAT CGAGTT CAT CAAGTGAAC CAGAAAC AAAT CCGT CAACTAAT CCACCTAC 
AAC AGAAC CAT C G C AAC C C T C AC CT AGT G AAG AG AAC AAG C C T GAT GGT AG AAC GAAG AC AGAAAT T GG C AAT AAT AAGG AT AT T T 
CTAGTGGAACAAAAGT AT TAATTT C AGAAGAT AGT AT TAAGAATTT T AGT AAAGC AAGT AGT GAT C AAG AAG AAGT GG AT CGCGAT 
GAAT CAT CATCTT C AAAAGC AAATGATGGGAAAAAAGGCC AC AGT AAGCCTAAAAAGGAA 

SEQ ID NO. 2705: SAG1473 FROM THE A909 GBS TYPE la STRAIN 

GATACAAGTGATAAGAATACTGACACGAGTGTCGTGACTACGACCTTATCTGAGGAGAAAAGATTAGATGAACTAGACCAGTCTAG 
TACT GGT T CTTCTTCT G AAAAT GAAT CGAGT T CAT C AAGT G AAC CAGAAAC AAAT C C C T C AAC T AAT C C AC C T AC AAC AG AAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGTATTAATT T CAGAAGAT AGT ATT AAGAAT TTTAGT AAAGC AAGT AGTGAT CAAGAAGAAGT GGAT CGCGAT GAAT CATCAT C 
T T C AAAAG C AAAT GAT G AG AAAAAAGGC C AC AG T AAG C CT AAAAAG GAA 

SEQ ID NO. 2706: SAG1473 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

GAT AC AAG T GAT AAG AAT AC T GAC AC G AGT GT CGT GAC T AC GAC CT TAT C T GAG G AGAAAAGAT C AG AT G AAC T AGAC C AGT C T AG 
T ACTGGTT CTT CTT CTGAAAAT GAAT CGAGT T CAT CAAGT GAACCAGAAAC AAAT CCGT CAACTAAT CCACCTAC AAC AG AAC CAT 
C G C AAC C CT C AC C T AGT GAAG AG AAC AAG C C T GAT GGT AG AAC GAAG AC AG AAAT T G G C AAT AAT AAG GAT AT T T C T AGT G G AAC A 
AAAGT AT TAATT T CAGAAGAT AGT ATT AAGAATTT T AGT AAAG CAAGT AGT GAT CAAGAAGAAGT GGAT CGCGAT GAATCAT CAT C 
T T C AAAAG C AAAT GAT G G G AAAAAAG G C C AC AGT AAG C C T AAAAAG GAA 

SEQ ID NO. 2707: SAG1473 FROM THE COH1 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GAT ACAAGTGAT AAGAAT ACTGACACGAGT GT CGT GACTACGACCT TAT CTGAGGAGAAAAG AT CAGATGAACTAGACC AGT CT AG 
T ACTGGTT CT T CTT CTGAAAAT GAAT CAAGT T CAT CAAGT GAACCAGAAAC AAAT CC CTCAACT AAT C CACCTACAACAGAACCAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGGAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGTATT AAT TT CAGAAGAT AGT ATT AAG AATT TT AGT AAAG CAAGT AGT GAT C AAG AAG AAGT GGAACGCGATG AAT CAT CAT C 
T T C AAAAG C AAAT GAT G AG AAAAAAGG C C AC AGT AAG C C T AAAAAGG AA 

SEQ ID NO. 2708: SAG1473 FROM THE H36b GBS TYPE lb STRAIN 

GAT ACAAGTGAT AAGAAT ACT G AC ACGAGTGT CGT GACTACGACCTT AT CTGAGG AGAAAAGAT TAGATGAACT AGAC CAGTCT AG 
T AC TGGTTCTTCTTCT G AAAAT GAAT CG AG T T CAT C AAGT G AAC CAGAAAC AAAT C C C T CAACTAAT C C AC CT AC AAC AG AAC CAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGTATT AAT TT CAGAAGAT AGTATT AAGAATTT T AGT AAAGCAAGT AGTGAT C AAG AAr AAGT GGAT CGCGAT GAAT CAT CAT C 
T T C AAAAG C AAAT GAT GAG AAAAAAG G C C AC AGT AAG C CT AAAAAG GAA 

SEQ ID NO. 2709: SAG1473 FROM THE JM910013 GBS TYPE VIII STRAIN 

GAT AC AAG T GAT AAG AAT AC T GAC AC G AGT GT CGT GAC TAG GAC C T TAT C T GAG GAG AAAAG AT TAG AT G AAC TAG AC C AG T CT AG 
TACTGGTTCTTCTTCTGAAAATGAATCGAGTTCATCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCAT 
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CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGTAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGT AT T AAT T T C AGAAG AT AGT AT T AAG AAT T T T AG T AAAG C AAGT AG T GAT C AAGAAG AAG T GGAT C G C GAT G AAT CAT CAT C 
TTCAAAAGCAAATGATGAGAAAAAAGGCCACAGTAAGCCTAAAAAGGAA 

SEQ ID NO. 2710: SAG1473 FROM THE M732 GBS TYPE III STRAIN 

GAT AC AAGT G AT AAGAAT AC T G AC AC G AGT GT CGT G ACT AC GAC CT T AT C T GAGG AG AAAAG AT C AG AT GAAC TAG AC C AGT C TAG 
TACTGGTTCTTCTTCTGAAAATGAATCAAGTTCATCAAGTGAACCAGAAACAAATCCCTCAACTAATCCACCTACAACAGAACCAT 
CGCAACCCTCACCTAGTGAAGAGAACAAGCCTGATGGGAGCACGAAGACAGAAATTGGCAATAATAAGGATATTTCTAGTGGAACA 
AAAGT AT T AAT T T C AG AAG AT AGT AT T AAG AAT T T T AG T AAAG C AAGT AGT GAT C AAG AAGAAGT G GAAC G C GAT GAAT CAT CAT C 
T T C AAAAG C AAAT GAT G AG AAAAAAGG C C AC AGT AAG C C T AAAAAG G AA 

SEQ ID NO. 2711: SAG1473 FROM THE M781 GBS TYPE III STRAIN 

GAT ACAAGT GAT AAGAAT ACTGACACGAGTGT CGT GACTACGACCTTAT CTGAGGAGAAAAGAT CAGAT GAACT AGACCAGT CTAG 
T AC TGGTTCTTCTTCT G AAAAT GAAT C AAGT T CAT C AAGT GAAC C AG AAAC AAAT C C CT C AAC T AAT C C AC C T AC AAC AGAAC C AT 
C G C AAC C C T C AC C TAG T GAAG AG AAC AAGC C T GAT GGG AGC AC GAAGAC AGAAAT T GG C AAT AAT AAGG AT AT T T CT AGT GG AAC A 
AAAG T AT T AAT T T C AG AAGAT AGT AT T AAG AAT T T T AGT AAAG C AAGT AG T GAT C AAG AAG AAG T GGAT CG CG AT GAAT CAT CAT C 
T T C AAAAG C AAAT GAT GAG AAAAAAGG C C AC AGT AAG C C T AAAAAGG AA 

>SEQ ID NO 2750:4_1169NT frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKASD 
GKKGHSKPKKE 

>SEQ ID NO 2751:4_18RS21 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGRTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVDRDESSSSKAND 
GKKGHSKPKKE 

>SEQ ID NO 2752 :4_2 603 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
S PSEENKPDGRTKTE I GNNKDI S SGTKVLI SEDS IKNFSKAS SDQEEVDRDES S S SKAND 
GKKGHSKPKKE 

>SEQ ID NO 2753:4_090 frame: 1 

DQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQPSPSEENKPDGRTKTEIGNNKDISSG 
TKVLISEDSIKNFSKASSDQEEVDRDESSSSKANDGKKGHSKPKKE 

>SEQ ID NO 2754:4_A909 frame: 1 

DTSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTE I GNNKDI SSGTKVLI SEDS IKNFSKASSDQEEVDRDESSS SKAND 
EKKGHSKPKKE 

>SEQ ID NO 2755:4_CJB110 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
S PSEENKPDGRTKTE IGNNKDISSGTKVLI SEDS IKNFSKAS S DQEEVDRDES S S SKAND 
GKKGHSKPKKE 

>SEQ ID NO 2756:4_COHl frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDS IKNFSKAS SDQEEVERDESSS SKAND 
EKKGHSKPKKE 

>SEQ ID NO 2757:4_H36B frame: 1 

DTSDKNTDTSWTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
SPSEENKPDGSTKTE IGNNKDISSGTKVLI SEDS IKNFSKAS SDQEXVDRDESSS SKAND 
EKKGHSKPKKE 

>SEQ ID NO 2758:4_JM9130013 frame: 1 

DTSDKNTDTSVVTTTLSEEKRLDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
S PSEENKPDGSTKTE IGNNKDIS S GTKVLI SEDS IKNFSKAS S DQEEVDRDE S S S SKAND 
EKKGHSKPKKE 

>SEQ ID NO 2759:4_M732 frame: 1 

DTSDKNTDTSWTTTLSEEKRSDELDQSSTGSSSENESSSSSEPETNPSTNPPTTEPSQP 
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SPSEENKPDGSTKTEIGNNKDISSGTKVLISEDSIKNFSKASSDQEEVERDESSSSKAND 
EKKGHSKPKKE 

>SEQ ID NO 2760:4_M781 frame: 1 

dtsdkntdtswtttlseekrsdeldqsstgsssenesssssepetnpstnppttepsqp 
spseenkpdgstkteignnkdissgtkvlisedsiknfskassdqeevdrdesssskand 
ekkghskpkke 

seq id no. 2801: sag1552 from the 1169nt1 gbs type v strain 
(reverse complement) 

tttgttgttaaaggtgatactgtacttcacaagcccaccaataaaccttttgttgttaaaggagtagacgttgagtcttccttagc 
aggttatcatcacaacgattttcctattactcaaaaaacgtatcgtgagtggttccatttaatttccaacatgggggcaaatactg 
taagagtcaaagtaccgatgaatgttgcattttacgatgctttatatcaccacaacaaagcatcaaagaggccactgtatttgttg 
caaggaatacgtatagattcttatcgcaataatgcttctataacagcttttaatgataattatagggggtatttaaaacgagaagc 
aaaaggcgttgtggatattctccatgggcgtaagcaagtatggaatactgattttggtagccgtcattatcattatgatcttagtc 
cttgggtacttggttatgtcgtaggggatgattggaatagtggtactgtcgcttatactaatcatcaagagaaaaaaacgcaatat 
aaaggacgttattttaaaacttctgcggcagctaatccatttgaggtcatgctagctcaagttAtggatgaattgacacattatga 
g ac agc t aaat at ggt t gg caac at t t gat t agt t t t t c aaac t c ac c aac aac ag ac c c t t t t c gt tat cgaaaac cat t t g agg 
cacaggctcctaaatacgtacaactaaatgtagaaaatattcaagctaattcgaatgttaaagcaggtatttttgcagcatataaa 
gc t at t gat t t c cat c c t c gat ac aag gat tat c tat tat t t gat aaagag aat at c agt aaagaag at ag ac aaaagat t aaaga 
actttctttgtcacagggatacgttaaactgctaaatgcttatcacaaaatccctgttctagtcacgggttatggctattcgacag 
cgagaggtattgcccaaaaagaaattgataaacgtcctctgccgattaatgaaaaagaacaaggtcagcgtttactagaagattat 
gaatcttttatatcatccggtagttttggagcgactatcaatgcatggcaagacgattggaatgcaagggcgtggaatacatcctt 
c g c c ac aaat aaac at agt c aat t c ct at gg g g g gat gc ac aagt at t t aat c aaggt t at g gt t t at tag g c t t t aaaaac gc aa 
aac at cat tat c aagt t gat g g t aaaag ag g c aaagg ag agt gg aaac at c ct c t g 

seq id no. 2802: sag1552 from the 

atgactagtgcaacaggagatgacttatatgctagcagtgatgaaagctatctctaccttgcgattaaaacaaaacctgaaaaact 
aaaagaaaaacgattattaccaatagatattacaccaaaatctggtagtagaaaaatgaatggtagtaaggtcacattttctaaat 
ctagtgactttgtattgtctattgatccaaatggcaagtctgaattatttgtccaagagcgctataatgccttaaaagcgaactat 
cttcgacagcttaacggtaaagatttttatgctttcccaccaaagaagaacagtagtaattttgagcagatcaatatggtattgag 
aaatacaaagattgttgaagacatggaaaaagtaaaagcaacagagaggttcttaccaactcatcctactggtcttctcaaaacag 
g aac aat t gat ag g c ac c aaaaaac at t t gat t c ac aaac ag at at ttcgtttg gaaagg ac t t tat ag ag gt c ag aat t c cgt gg 
c agt t gt t g aat t t t t c t gat c cat cat c t c aaaaaat t c ac gat g at t ac t t t aaac at tat ggt gt gaag gagt tag aaat t g a 
gagcattgctttaggattaggtgctaatagcaaagaaaacacactgataaagatggcagattatcgtttgaaaaattgggagagac 
c c gat ac c aaaac c t t t t t aaaag ac t c c tat tat ag tat t t aag aaag aa 

seq id no. 2803: sag1552 from the 18rs21 gbs type ii strain 

AAGG G C T T AT T AAAAG AAAAT AC AAG AAC T AAC T T T G T T G T T AAAG GT GAT AC T GT AC T T C AC AAG C C C AC C AAT AAAC C T T T T G T 
TGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGT 
TCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCAC 
AACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAA 
TGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATT 
TGGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCT 
TATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCT 
AGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAA 
CAGACCCTTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCA 
AATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGAGAA 
T AT C AGT AAAG AAG AT AG AC AAAAG AT T AAAG AAC TTTCTTTGT C AC AG GG AT AC GT T AAAC T G C T AAAT G C T TAT C AC AAAAT C C 
CTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAA 
AAAG AAC AAGGT C AGC GT T TACT AG AAG AT TAT G AAT CT T T T AT AT CAT C C G G T AGT T T T G G AGC G AC TAT C AAT G C AT GGC AAG A 
CGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATC 
AAGG T TAT G GT T T AT T AGG C T T T AAAAAC G C AAAAC AT CAT TAT C AAGT T GAT G GT AAAAG AGG C AAAG GAG AGT G G AAAC AT C CT 
CTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAA 
ACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTA 
AATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAAC 
TATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATT 
GAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAA 
CAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCG 
TGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAAT 
T GAG AG CAT T G CT T TAG GAT TAG G T G C T AAT AG C AAAG AAAAC AC AC T GAT AAAG AT G G C AG AT TAT C GT T T G AAAAAT T G GG AG A 
GACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATGTATTAAGAAAGAA 

SEQ ID NO. 2804: SAG1552 FROM THE 2603 V/R GBS TYPE V STRAIN 
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TATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAA 
GGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTT 
AAT T T C C AACAT GGGG G C AAAT ACT GT AAGAGT C AAG GT AC CG AT G AAT GT T G CAT T T TAG GAT G C C T TAT AT C ACC AC AAC AAAG 
CAT C AAAG AG GC C ACT GT ATT T GT T G C AAG GAAT AC GT AT AG AT T CT T AT CG C AAT AAT G C T T C T AT AAC AG CT T T T AAT G AT AAT 
TATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTGGGTAG 
CCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTTATACTA 
ATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAA 
GTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCC 
TTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAACTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTA 
AAG C AG G T AT GT T T G C AG CAT AT AAAG C TAT T GAT T T C CAT C C T C GAT AC AAG GAT TAT CT AT TAT T T GAT AAAG AG AAT AT CAGT 
AAAGAAGATAGACAAAAGATTAAAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCT 
AGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAAC 
AAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATTGG 
AATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTA 
TGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAAACATCCTCTGATGA 
CTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAA 
GAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAG 
TGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTC 
GACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATTGAGAAAT 
AC AAAG AT T GT T GAAG AC AT G G AAAAAGT AAAAG C AAC AG AG AGG T T C T T ACC AAC T CAT C C T ACT GGT CT T CT C AAAAC AG GAAC 
AAC T GAT AG G C AC C AAAAAAC AT T T GAT T C AC AAAC AGAT AT T T C GT T T GG AAAG G AC T T T AT AG AGGT C AG AAT T C C GT GG CAGT 
T GT T GAAT T T T T C T GAT C CAT CAT C T C AAAAAAT T C AC GAT GAT T ACT T T AAAC AT T AT GGT GT GAAG GAG T TAG AAAT T G AGAG C 
AT T G CT T TAG GAT T AGGT G CT AAT AGC AAAG AAAAC AC ACT GAT AAAG AT G G C AG AT TAT C GT T T G AAAAAT T G G GAG AG AC C CG A 
TACCAAAACCTTTTTAAAAGACTCCTATTATAGTATTAAGAAAGAATGGTCTAAAGAAAGAGAGAGAACATATGGTCCA 

SEQ ID NO. 2805: SAG1552 FROM THE A909 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

AAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGT 
TGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGT 
TCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCAC 
AACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAA 
TGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATT 
TGGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCT 
TATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCT 
AGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAA 
CAGACCCTTTT CAT TAT CGAAAACCATTTGAGGCACAGGCT CCTAAATACGTACAACT AAAT GTAGAAAATATTCAAGCTAATTCA 
AATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGAGAA 
TAT CAGT AAAGAAGATAGACAAAAGATTAAAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCT AAAT GCTT AT CACAAAATCC 
CTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAA 
AAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGA 
CGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATC 
AAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAAACATCCT 
CTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAA 
ACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTA 
AATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAAC 
TATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATT 
GAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAA 
C AG GAAC AAC T GAT AG G C ACC AAAAAAC AT T T GAT T C AC AAAC AG AT AT TTCGTTTG G AAAG G AC T T TAT AG AG G T C AGAAT T C C G 
TGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAGAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAAA 
T T GAG AG C CAT T G C T T T AG GAT TAG G T G C T AAT AG C AAAG AAAAC AC AC T GAT AAAG AT GG C AG AT T AT C GT T T G AAAAAT T GG G A 
GAGACCCGATACCAAAACCTTTTTAAAAGA 

SEQ ID NO. 2806: SAG1552 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

TATTACTTTGATGGTAGTTTGTATTTACCAAAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGT 
ACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTC 
CT ATT ACTCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCC AAC ATGGGGGCAAAT ACT GT AAGAGT CAAGGTACC GAT GAAT 
GT T G CAT T T T ACG AT G C CT TAT AT C AC C AC AAC AAAG CAT C AAAG AG G C C AC T GT AT T T G T T G C AAGG AAT AC G T AT AG AT T C T T A 
TCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCC 
ATGGGCGTAAGCAAGTATGGAATACAGATTTTGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTA 
GGGGATGATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTC 
TGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAAC 
ATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGAAAACCATTTGAGGCACAGGCTCCTAAATACGTACAA 
CTAAATGTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATA 



124 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



C AAGGATT AT CT AT T AT TT GAT AAAGAGAAT AT CAGTAAAGAAGAT AGAC AAAAGATT AAAGAACT TT CTTTGT CACAGGGATACG 
TTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAA 
ATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAG 
TTTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAATCAAT 
TCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGT 
AAAAGAGG C AAAG GAGAGT GG AAAC AT C C T CT GAT GAC T AGT GC AAC AG GAG AT GAC T T AT AT GCT AGC AGT GAT G AAAG CT AT CT 
CTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAA 
AAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTC 
CAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAG 
T AGT AAT T T T GAG C AGAT AAAT AT GGT AT T G AG AAAT AC AAAG AT T GT T G AAG AC AT GG AAAAAGT AAAAG C AAC AG AG AGGT T CT 
TACCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAAACAGATATTTCGTTT 
G G AAAGGACT T TAT AG AGGT C AG AAT T C C GT G GC AGT T GT T G AAT T T T T C T GAT C CAT CAT CT C AAAAAAT T C AC GAT GAT T AC T T 
T AAAC AT TAT GGT GT G AAGGAGT T AGAAAT T GAGAGC AT T G C T T TAG GAT TAG GT G CT AAT AG C AAAG AAAAC AC AC T GAT AAAG A 
TGGCAGATTATCGTTTGAAAAATTGGGAGAGACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATGTATTAAGAAAGA 

SEQ ID NO. 2807: SAG1552 FROM THE COHl GBS TYPE III STRAIN 

TTTACCACAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAAC 
CTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGT 
GAATGGTTCCATTTAATTTCCAACATGGGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATA 
TCACCACAACAAAGAATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAG 
CTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAAT 
ACTGATTTTGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTAC 
TGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGG 
TCATGCTAGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCA 
C C AAC AAC AG AC C CT T T T CAT TAT C G AAAAC CAT T T G AGG C AC AG G CT C C T AAAT AC GT AC AAC T AAAT GT AG AAAAT AT T C AAG C 
TAATTCAAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATA 
AAG AG AAT AT CAGTAAAGAAGAT AGAC AAAAG AT T AAAGAACT T T C T T T GT C AC AGGGAT AC GT T AAAC TG C T AAAT G CT T AT C AC 
AAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGAT 
TAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCAT 
GGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTA 
TTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGAGGCAAAGGAGAGTGGAA 
AC AT CC T CT GAT GACT AG T G C AAC AG GAG AT GAC T TAT AT GCT AGC AGT GAT G AAAG CT AT C T C t AC CT T G CG AT T AAAAC AAAAC 
CTGAAAAACTAAAAGAAAAACGATTATTACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACA 
TTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAA 
AGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATA 
TGGTATTGAGAAATACAAAGATTGTTGAAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTT 
CTCAAAACAGGAACAACTGATAGGCACCAAAAAACATTTGATTCACAACCAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAG 
AATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGT 
TAGAAATTGAGAGCATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAAT 
T GGG AG AG AC C CG AT AC C AAAAC CT T T T T AAAAGACT 

SEQ ID NO. 2808: SAG1552 FROM THE H36b GBS TYPE lb STRAIN 

AAGGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTG 
TTGTTAAAGGAGTAGACGTTGAGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGG 
T T CC ATT T AAT T T C C AAC AT GGGGG C AAAT AC T GT AAG AGT C AAG GT AC C GAT G AAT GT T G CAT T T T AC G AT GC CT T AT AT C AC C A 
CAACAAAGCATCAAAGAGGCCACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTA 
ATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGAT 
TTTGGTAGCAGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATGGACATAGTGGTACTGTCGC 
TTTATACTAATCATCT^AGAGGAGAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCAT 
G CT AG CT C AAGT AAT GG AT G AAT T GAC AC AT TAT GAG AC AG C T AAAT AT G GT T G G C AAC AT T T G AT T AGT T T T T C AAACT C AC C AA 
C AAC AG AC C C T T T T CAT TAT CG AAAAC CAT T T G AGG C AC AGG C T C CT AAAT AC GT AC AAC T AAAT G TAG AAAAT AT T C AAGCT AAT 
TCGAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGATTATCTATTATTTGATAAAGA 
GAAT AT CAGTAAAGAAGAT AG AC AAAAG ATT AAAGT^ACTTTCTTTGTC AC AGGG AT ACGTTAAACTGCTAAATGCTT AT C AC AAAA 
TCCCTGTTCTAGTCACGGGTTATGGCTACTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAAT 
GAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCA 
AGACGATTGGAATGCAAGGGTGTGGAATACATCCTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTA 
ATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAGGTTGATGGTAAAAGAGGCAAAGAAGAGTGGAAACAT 
CCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGA 
AAAACT AAAAG AAAAAC GAT TAT T AC C AAT AG AT AT T AC AC C AAAAT C T G GT AGT AG AAAAAT G AAT GG TAG T AAGGT C AC AT T T T 
CTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAACGCCTTAAAAGCG 
AACTATCTTCGACAGCTTAATGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGT 
AT T GAG AAAT AC AAAG AT T G T T G AAG AC AT GG AAAAAGT AAAAG C AAC AG AG AG GT T CT T AC C AAC T CAT C C T AC TGGTCTTCTCA 
AAAC AG GAAC AAC T G AT AGG C AC C AAAAAAC AT T T GAT T C AC AAAC AG AT AT TTCGTTTG G AAAG GACT T T AT AG AG GT C AG AAT T 
CCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGA 
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AATTGAGAGCATTGCTTTAGGATTAGGTGCTAATAGCAAAGAAAACACACTGATAAAGATGGCAGATTATCGTTTGAAAAATTGGG 
AGAGACCCGATACCAAAACCTTTTTAAAAGACTCCTATTATAGT 

SEQ ID NO. 2809: SAG1552 FROM THE JM9130013 GBS TYPE VIII STRAIN 

ACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTGAGTCTTCCTTA 
GCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATGGGGGCAAATAC 
T G T AAGAGT C AAGGT AC C GAT G AAT GT T G CAT T T T AC G AT GCC T TAT AT C AC C AC AAC AAAGC AT C AAAG AGGC C ACT GT AT T T GT 
TGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAA 
GCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCAGTCATTATCATTATGATCTTAG 
TCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAAT 
ATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAATTGACACATTAT 
GAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGAAAACCATTTGA 
GGC AC AG G CT C CT AAAT AC GT AC AACT AAAT GT AG AAAAT AT T C AAG C T AAT T C G AAT G T T AAAG C AGGT AT GT T T G C AG CAT AT A 
AAG C TAT T GAT T T C CAT C CT C G AT AC AAG GAT TAT C TAT TAT T T G AT AAAG AGAAT AT C AGT AAAGAAG AT AGAC AAAAG AT T AAA 
GAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTACTCGAC 
AGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATT 
ATGAATCTTTTATATCATCCGGTAGTTTTGGAGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGTGTGGAATACATCC 
TTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGC 
AAAAC AT CAT TAT C AGG T T GAT G GT AAAAGAGG C AAAGAAG AGT GGAAAC AT C C T CT GAT G AC T AGT GC AAC AGG AG AT GACT TAT 
AT GCT AG C AG T GAT G AAAG CT AT CT C T AC CT T GC G AT T AAAAC AAAACCT GAAAAACT AAAAGAAAAAC GAT T AT T AC C AAT AG AT 
ATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCC 
AAAT G G C AAGT C T G AAT TAT T T GT C C AAG AG CG C TAT AAC G C C T T AAAAGCG AACT AT C T T C GAC AG C T T AAT GGT AAAG AT T T T T 
ATGCTTTCCCACCAAAGAAGAACAGTAGTAATTTTGAGCAGATAAATATGGTATTGAGAAATACAAAGATTGTTGAAGACATGGAA 
AAAGT AAAAGC AAC AG AG AG GT T C T T AC C AACT CAT C C T AC TGGTCTTCT C AAAAC AGGAAC AACT GAT AGGC AC C AAAAAAC AT T 
T GAT T C AC AAAC AG AT AT T T C GT T T GG AAAG G ACT TT AT AG AGGT C AG AAT T C C GT G G C AG TT GT T G AAT TT T T C T GAT C CAT CAT 
C T C AAAAAAT T C ACG AT GAT TACT T T AAAC AT T AT GGT GT G AAGG AGT TAG AAAT T G AG AGC AT T G CT T TAG GAT T AGGT G C T AAT 
AG C AAAGAAAAC AC AC T GAT AAAG AT G G C AG AT TAT C GT T T G AAAAAT T G GG AG AGAC C CG AT AC C AAAAC CT T T T T AAAAG ACT C 
CTATTATAGTATTAAGAAAG 

SEQ ID NO. 2810: SAG1552 FROM THE M732 GBS TYPE III STRAIN 

TACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCACAAGCCCACCAATAAACCTTTTGTTGTTAAAGGAGTAGACGTTG 
AGTCTTCCTTAGCGGGTTATCATCACAACGATTTTCCTATTACTCAAAAAACGTATCGTGAATGGTTCCATTTAATTTCCAACATG 
GGGGCAAATACTGTAAGAGTCAAGGTACCGATGAATGTTGCATTTTACGATGCCTTATATCACCACAACAAAGAATCAAAGAGGCC 
ACTGTATTTGTTGCAAGGAATACGTATAGATTCTTATCGCAATAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATT 
TAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGCGTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCAT 
TATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGATGATTGCAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAA 
AAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGCAGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAAT 
TGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGATTAGTTTTTCAAACTCACCAACAACAGACCCTTTTCATTATCGA 
AAACC AT TTG AGGC AC AGGCTCCT AAAT ACGT AC AACT AAAT GT AG AAAAT ATT CAAGCTAATTCAAATGTT AAAGC AGGT AT GTT 
T G C AG CAT AT AAAG C TAT T GAT T T C CAT C CT C G AT AC AAGGAT T AT CT AT TAT T T GAT AAAG AGAAT AT C AGT AAAG AAG AT AG AC 
AAAAGATTAAAGAACTTTCTTTGTCACAGGGATACGTTAAACTGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTAT 
GGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGATAAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTT 
AC TAG AAG AT TAT GAAT C T T T TAT AT CAT C CG GT AG T T T T GG AG C GAC TAT C AAT GC AT G GC AAGAC GAT T G G AAT GC AAGG G C GT 
GGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTATGGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGC 
T T T AAAAAC G C AAAAC AT CAT TAT C AAGT T GAT G G T AAAAG AG G CAAAGG AG AGT GGAAAC AT C CT C T GAT GAC T AGT G C AAC AGG 
AGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCTTGCGATTAAAACAAAACCTGAAAAACTAAAAGAAAAACGATTAT 
TACCAATAGATATTACACCAAAATCTGGTAGTAGAAAAATGAATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTG 
TCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAGCGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGG 
T AAAGAT T T T TAT G C T T T C C C AC C AAAG AAG AAC AGT AGT AAT T T T GAG C AG AT AAAT AT GGT AT T GAGAAAT AC AAAG AT T GT T G 
AAGACATGGAAAAAGTAAAAGCAACAGAGAGGTTCTTACCAACTCATCCTACTGGTCTTCTCAAAACAGGAACAACTGATAGGCAC 
CAAAAAACATTTGATTCACAAACAGATATTTCGTTTGGAAAGGACTTTATAGAGGTCAGAATTCCGTGGCAGTTGTTGAATTTTTC 
TGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACATTATGGTGTGAAGGAGTTAGAAATTGAGAGCATTGCTTTAGGAT 
T AG GT G C T AAT AG C AAAG AAAAC AC AC T GAT AAAG AT GG C AG AT TAT C GT T T G AAAAAT T GGG AG AG AC C C GAT AC C AAAAC CT T T 
T T AAAAG ACT CCT ATT AT AGT ATT AAG 

SEQ ID NO. 2811: SAG1552 FROM THE M781 GBS TYPE III STRAIN 

TTTGATGGTAGTTTGTATTTACCACAGGGCTTATTAAAAGAAAATACAAGAACTAACTTTGTTGTTAAAGGTGATACTGTACTTCA 
C AAG C C C AC C AAT AAAC C T T T T GT T GT T AAAG G AG T AG AC GT T G AGT C T T C CT TAG C GGGT TAT CAT C AC AAC GAT T T T C CT AT T A 
CTCAAAAAACGT AT CGTGAATGGTTCC ATT TAATTTCCAACATGGGGGC AAAT ACT GT AAGAGT C AAGGT ACCGAT GAAT GTTGC A 
T T T T AC GAT G C C T TAT AT C AC C AC AAC AAAG AAT C AAAG AG G C C AC T GT AT T T GT T G C AAG GAAT ACGT AT AG AT T C T T AT CG G AA 
TAATGCTTCTATAACAGCTTTTAATGATAATTATAGGGGGTATTTAAAACGAGAAGCAAAAGGCGTTGTGGATATTCTCCATGGGC 
GTAAGCAAGTATGGAATACTGATTTTGGTAGCCGTCATTATCATTATGATCTTAGTCCTTGGGTACTTGGTTATGTCGTAGGGGAT 
GATTGGAATAGTGGTACTGTCGCTTATACTAATCATCAAGAGAAAAAAACGCAATATAAAGGACGTTATTTTAAAACTTCTGTGGC 
AGCTAATCCATTTGAGGTCATGCTAGCTCAAGTAATGGATGAATTGACACATTATGAGACAGCTAAATATGGTTGGCAACATTTGA 
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T T AGT T T T T C AAAC T C AC C AAC AAC AG AC C CT T T T CAT TAT CG AAAACC AT T T GAGG C AC AG GCT C CT AAAT AC GT AC AAC T AAAT 
GTAGAAAATATTCAAGCTAATTCAAATGTTAAAGCAGGTATGTTTGCAGCATATAAAGCTATTGATTTCCATCCTCGATACAAGGA 
T TAT CT AT T AT T T G AT AAAG AGAAT AT C AGT AAAG AAG AT AG ACAAAAG AT T AAAG AAC T T T C T T T GT C AC AG GGAT ACGT T AAAC 
TGCTAAATGCTTATCACAAAATCCCTGTTCTAGTCACGGGTTATGGCTATTCGACAGCGAGAGGTATTGCCCAAAAAGAAATTGAT 
AAACGTCCTCTGCCGATTAATGAAAAAGAACAAGGTCAGCGTTTACTAGAAGATTATGAATCTTTTATATCATCCGGTAGTTTTGG 
AGCGACTATCAATGCATGGCAAGACGATTGGAATGCAAGGGCGTGGAATACATCTTTCGCCACAAATAAACATAGTCAATTCCTAT 
GGGGGGATGCACAAGTATTTAATCAAGGTTATGGTTTATTAGGCTTTAAAAACGCAAAACATCATTATCAAGTTGATGGTAAAAGA 
GGCAAAGGAGAGTGGAAACATCCTCTGATGACTAGTGCAACAGGAGATGACTTATATGCTAGCAGTGATGAAAGCTATCTCTACCT 
TGCGATTAAAACAAAACCT GAAAAACT AAAAGAAAAACGAT TAT T ACCAAT AGATATTAC ACCAAAAT CTGGTAGT AGAAAAAT GA 
ATGGTAGTAAGGTCACATTTTCTAAATCTAGTGACTTTGTATTGTCTATTGATCCAAATGGCAAGTCTGAATTATTTGTCCAAGAG 
CGCTATAATGCCTTAAAAGCGAACTATCTTCGACAGCTTAACGGTAAAGATTTTTATGCTTTCCCACCAAAGAAGAACAGTAGTAA 
T T T T GAG C AGAT AAAT AT G GT AT T G AGAAAT AC AAAGAT T GT T G AAG AC AT G G AAAAAGT AAAAG C AAC AGAG AGGT T C T T ACC AA 
CT C AT C CT ACT G GT CT T CT C AAAAC AG G AAC AACT G AT AGG C AC C AAAAAAC AT T T GAT T C AC AAAC AG AT AT T T C GT T T GG AAAG 
GACTTTATAGAGGTCAGAATTCCGTGGCAGTTGTTGAATTTTTCTGATCCATCATCTCAAAAAATTCACGATGATTACTTTAAACA 
T TAT G GT GT G AAGG AGT TAG AAAT T G AGAG C ATT G CT T TAG GAT T AG GT G CT AAT AG C AAAG AAAAC AC AC T GAT AAAG AT G G C AG 
AT TAT C GT T T GAAAAAT T G GGAG AG AC C C GAT AC C AAAAC CT T T T T AAAAG ACT C CT AT T AT AGT AT T AAG AAAG AAT GG 

>SEQ ID NO 2850:62_1169NT frame: 1 t 

fwkgdtvlhkptnkpfwkgvdvesslagyhhndfpitqktyrewfhlisnmgantvrv 
kvpmnvafydalyhhnkaskrplyllqgiri ds yrnnas itafndnyrgylkreakgwd 
i lhgrkqvwnt d fg s rh yh y dl s pwvlg ywg d dwn s gt vaytnhqe kkt qykgry fkt s 
aaanpfevmlaqvmdelthyetakygwqhlisfsnspttdpfryrkpfeaqapkyvqlnv 
eniqansnvkagifaaykaidfhprykdyllfdkeniskedrqkikelslsqgyvkllna 
yhkipvlvtgygystargiaqkeidkrplpinekeqgqrlledyesfissgsfgatinaw 
qddwnarawntsfatnkhsqflwgdaqvfnqgygllgfknakhhyqvdgkrgkgewkh.pl 
mtsatgddlyassdesylylaiktkpeklkekrllpiditpksgsrkmngskvtfskssd 
fvlsidpngkselfvqerynalkanylrqlngkdfyafppkknssnfeqinmvlrntkiv 
edmekvkaterflpthptgllktgtidrhqktfdsqtdisfgkdfievripwqllnfsdp 
ssqkihddyfkhygvkeleiesialglganskentlikmadyrlknwerpdtktflkdsy 

YSI.ER 

>SEQ ID NO 2851:62_18RS21 frame: 1 

KGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHL 
ISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRG 
YLKRE AKGWD I LHGRKQVWNT DLG S RH YH Y DLS PWVLG YWG D DWN S GT VAYTNHQEKK 
TQYKGRYFKT S VAAN PFE VML AQVMDELTHYET AKYGWQHL I S FSN S PTT D P FHYRKP FE 
AQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELS 
LSQGYyKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFIS 
SGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDTTPKSGSRKMN 
GSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQ 
INMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVR 
IPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWER 
P DTKT FLKD S YYVLRK 



>SEQ ID NO 2852:62_2603 frame: 3 

LKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHLISN 
MGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRI DS YRNNAS I TAFNDNYRGYLK 
RE AKGWD I LHGRKQVWNT DLG SRHYHYDLS PWVLG YWG D DWN SGTVAYTNHQEKKTQY 
KGRY FKT S VAAN PFEVMLAQVMDELTHYETAKYGWQHLIS FSN S PTT DP FHYRKP FEAQA 
PKYVQLNVENIQANSNVKAGMFAAYBCAIDFHPRYKDYLLFDKENISKEDRQKIKELSLSQ 
GYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFISSGS 
FGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDGKRG 
KGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSK 
VTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQINM 
VLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVRIPW 
QLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWERPDT 
KTFLKDSYYSIKKEWSKERERTYGP 

>SEQ ID NO 2853:62 — A909 frame: 1 

KGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHL 
ISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNASITAFNDNYRG 
YLKREAKGVVD I LHGRKQVWNT DLGSRHYHYDLSPWVLGYWGDDWNSGT VAYTNHQEKK 
TQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKPFE 
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AQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELS 
LSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFIS 
SGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMN 
GSKVTFSKSSDFVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQ 
INMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVR 
IPWQLLNFSDPSSQRIHDDYFKHYGVKELEN . EPLL . D . VLIAKKTH . . RWQIIV. KIGR 
DPIPKPF.K 

>SEQ ID NO 2854:62_A909 frame: 1 

KGLLKENTRTN FWKG DT VLHKPTNKP FWKGVD VE S S LAGYHHND FP I TQKT YREW FHL 
ISNMGANTVRVKVPMNVAFYDALYHHNPCASBCRPLYLLQGIRIDSYRNNASITAFNDNYRG 
YLKREAKGWDILHGRKQVWNTDLGSRHYHYDLSPWVLGYVVGDDWNSGTVAYTNHQEKK 
TQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKPFE 
AQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELS 
LS QG Y VKLLNAYHKI P VLVTG YGYS T ARG I AQKE I DKRP L P INEKE QGQRLLE DYE S FI S 
SGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDG 
KRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMN 
G S KVT F S K S S D FVL SIDPNGKSEL FVQERYN ALKAN YLRQ LN GK D F YA F P PKKN S S N FE Q 
INMVLRNTKI VEDMEKVKATERFLPTHPTGLLKTGTTDRHQKT FDSQTDI S FGKDFIEVR 
I PWQ LLN FSDPSSQRIHD D Y FKH YG VKE LEN . EPLL . D . VLIAKKTH . . RWQIIV . KIGR 
DPIPKPF.K 

>SEQ ID NO 2855:62_CJB110 frame: 1 

YYFDGSLYLPKGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPIT 
QKTYREWFHLISNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRIDSYRNNAS 
ITAFNDNYRGYLKREAKGWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYWGDDWNSGT 
VAYTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTT 
DPFHYRKPFEAQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISK 
EDRQKIKELSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQR 
LLEDYES FI S SGS FGATINAWQDDWNARAWNTS FATNKHNQFLWGDAQVFNQGYGLLGFK 
NAKHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDI 
T PKSGSRKMNGSKVT FSKS S DFVLS I D PNGKSELFVQERYN ALKAN YLRQLNGKDFYAFP 
PKKN S SN FE Q INMVLRNTK I VEDMEKVKATERFLPTHPTGLLKTGTTDRHQKT FDSQTDI 
SFGKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKM 
ADYRLKNWERPDTKTFLKDSYYVLRK 

>SEQ ID NO 2856:62_COHl frame: 2 

LPQGLLKENTRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWF 
HLISNMGANTVRVKVPMNVAFYDALYHHNKESKRPLYLLQGIRIDSYRNNASITAFNDNY 
RGYLKRE AKGVVD I LHGRKQVWNT D FGSRH YHYDLS PWVLGYVVGDDWN SGTVAYTNHQE 
KKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKP 
FEAQAPKYVQLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKE 
LSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESF 
IS SGS FGATINAWQDDWNARAWNTS FATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQV 
DGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRK 
MNGSKVT FSKS S DFVLS I DPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNF 
EQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQPDISFGKDFIE 
VRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNW 
ERPDTKTFLKD 

>SEQ ID NO 2857:62_H36B frame: 2 

RGLLKENTRTNFV\/KGDTVLHKPTNKPFVVKGVDVESSLAGYHHNDFPITQKTYREWFHL 
I SNMGANTVRVKVPMNVAFYDALYHHNKASKRPLYLLQGIRI DS YRNNAS ITAFNDNYRG 
YLKREAKGWDILHGRKQVWNTDFGSSHYHYDLSPWVLGYVVGDDGHSGTVALY 

>SEQ ID NO 2858 : 62_JM9130013 frame: 3 

FVVKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHLISNMGANTVRV 
KVPMNVAFYDALYHHNKASKRPLYLLQGIR.IDSYRNNASITAFNDNYRGYLKREAKGWD 
ILHGRKQVWNTDFGSSHYHYDLSPWVLGYWGDDWNSGTVAYTNHQEKKTQYKGRYFKTS 
VAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDPFHYRKPFEAQAPKYVQLNV 
ENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELSLSQGYVKLLNA 
YHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFISSGSFGATINAW 
QDDWNARVWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDGKRGKEEWKHPL 
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MTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSKVTFSKSSD 
FVLSIDPNGKSELFVQERYNALKANYLRQLNGKDFYAFPPKKNSSNFEQINMVLRNTKIV 
EDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVRIPWQLLNFSDP 
SSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMADYRLKNWERPDTKTFLKDSY 
YSIKK 

>SEQ ID NO 2859:62_M732 frame: 2 

TRTNFWKGDTVLHKPTNKPFWKGVDVESSLAGYHHNDFPITQKTYREWFHLISNMGAN 
TVRVKVPMNVAFYDALYHHNKESKRPLYLLQGIRIDSYRNNASITAFNDNYRGYLKREAK 
GWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYWGDDCNSGTVAYTNHQEKKTQYKGRY 
FKT S VAAN P FE VML AQ VM DE L T H YE T AK Y GW QHL ISFSNSPTTDP FH YRKP FE AQAPK YV 
QLNVENIQANSNVKAGMFAAYKAIDFHPRYKDYLLFDKENISKEDRQKIKELSLSQGYVK 
LLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLLEDYESFISSGSFGAT 
INAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNAKHHYQVDGKRGKGEW 
KHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITPKSGSRKMNGSKVTFS 
KS SDFVLS I DPNGKSELFVQERYNALKAN YLRQLNGKDFYAFPPKKNS SNFEQINMVLRN 
TKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISFGKDFIEVRIPWQLLN 
FSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSRENTLIKMADYRLKNWERPDTKTFL 
KDSYYSIK 

>SEQ ID NO 2860:62_M781 frame: 1 

FDG S L YL PQGLLKENTRTN FWKGDT VLHKPTNKP FWKGV DVE S S LAG YHHN D FP I T QK 
TYREWFHLISNMGANTVRVKVPMNVAFYDALYHHNKESKRPLYLLQGIRIDSYRNNASIT 
AFNDNYRGYLKREAKGWDILHGRKQVWNTDFGSRHYHYDLSPWVLGYVVGDDWNSGTVA 
YTNHQEKKTQYKGRYFKTSVAANPFEVMLAQVMDELTHYETAKYGWQHLISFSNSPTTDP 
FH YRK P FE AQ A P K YVQLN VEN I Q AN S N VKAGM FAA YKA I D FH PR YKD Y L L F DKEN I S KE D 
RQKIKELSLSQGYVKLLNAYHKIPVLVTGYGYSTARGIAQKEIDKRPLPINEKEQGQRLL 
EDYESFISSGSFGATINAWQDDWNARAWNTSFATNKHSQFLWGDAQVFNQGYGLLGFKNA 
KHHYQVDGKRGKGEWKHPLMTSATGDDLYASSDESYLYLAIKTKPEKLKEKRLLPIDITP 
KSGSRKMNGSKVTFSKS SDFVLS I DPNGKSELFVQERYNALKAN YLRQLNGKDFYAFPPK 
KNSSNFEQINMVLRNTKIVEDMEKVKATERFLPTHPTGLLKTGTTDRHQKTFDSQTDISF 
GKDFIEVRIPWQLLNFSDPSSQKIHDDYFKHYGVKELEIESIALGLGANSKENTLIKMAD 
YRLKNWERPDTKTFLKDSYYSIKKEW 

SEQ ID NO. 2901: SAG1641 FROM THE 090 GBS TYPE la STRAIN 

AAT C AAG AAGT T T C AG C AAG C T C AACT T C AAGT AAAG T T GT T AAAGT T GGT G T TAT GAG C T T T T C T GAG AC T G AAAAAGC AC GT T G 
GGATAAAATTGAAAAGCTAGTAGGCGATAAAGCTAAAATCAAATTCACAGAATTTACAGATTATACACAACCAAATCAAGCGACAG 
C C AAT AAG GAT G T GG AT AT T AAT G C CT T T C AAC AT T AC AAT T T C T T AG AAAAC T GG AAT AAGG AAAAT AAG AAAAACT T AAT T C C A 
CTTGAAAAGACTTACTTAGCCCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGC 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
AGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAAGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTC 
AAAGATGTAGATGC AGCTATT AT T AAT AAT AC AT ACATT GAGC AAGCT AATT T AAAACCT T C AGATGCT AT CT TT GTTGAGAAATC 
AGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTATVAGCTATCCAAGCTA 
TCTTGGATGCTTATCACACAGATGAAGTGAAAAAAGTTATCAAAGATACTTCAGCTGATATTCCACAATGGAACCCAGCTTTCTTG 
TACAA 

SEQ ID NO. 2902: SAG1641 FROM THE 1169NT1 GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AT C AAG AAGT T T C AG C AAG CT C AAC T T C AAGT AAAGT T G T T AAAG T T GGT GT TAT G AC C T T T T CT G AC AC T G AAAAAG C AC GT T G G 
GATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGC 
C AAT AAG GAT G T GG AT AT T AAT G C C T T T C AAC AT T AC AAT T T C T T AG AAAACT GG AAT AAG G AAAAT AAG AAAAACT T AAT T C C AC 
TTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCA 
ATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAA 
GGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCA 
AAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCA 
GATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTAT 
C T T GG AT GC T T AT C AC AC AGAT G AAGT G AAAAAAGT TAT C AAAG AT AC T T C AG CT GAT AT T C C AC AATG G 

SEQ ID NO. 2903: SAG1641 FROM THE 18RS21 GBS TYPE II STRAIN 

AAT C AAG AAGT T T C AG C AAG C T C AAC T T C AAGT AAAGT T GT T AAAGT T G GT GT T AT G AC CT T T T C T G AC AC T G AAAAAG C AC G T T G 
G GAT AAAAT T GAAAAG CT AGT AGG T GAT AAAG CT AAAAT C AAAT T T AC AG AAT T T AC AG AT TAT AC AC AAC C AAAT C AAG C G AC AG 
C C AAT AAG GAT GT G GAT AT T AAT G C C T T T C AAC AT T AC AAT T T CT TAG AAAACT GG AAT AAG G AAAAT AAGAAAAAC T T AAT T C C A 
C T T GAAAAG AC T T AC T T AG CT C C AAT T C GT AT CT AT T CT GAGAAG GT AAAAT CT CT T AAAAAAT T G AAAAAAG G AGC C AC TAT T G C 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
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AGGT T G C AAC AGT T G CT AAT AT C AC AT C T AAT AAAAAG GAT AT T AAT AT T C AG G AGT T AG AT G C G AGT C AAAC AC C ACGT G C ACT C 
AAAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATC 
AG AT AAAAAT T C AAAAC AAT GGAT T AAT AT CAT T GC G G G AC GT AAAAAT T GG AAAAAG C AAAAG AAC G C T AAAG C T AT C C AAG CT A 
T CT T G GAT G CT T AT C AC AC AG AT G AAGT G AAAAAAGT T AT C AAAG AT ACT T C AG C T GAT AT T C C AC 

SEQ ID NO. 2904: SAG1641 FROM THE 2603 V/R GBS TYPE V STRAIN 

AATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTG 
GGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAG 
C C AAT AAGGAT G T GGAT AT T AAT G C C T T T C AAC AT T AC AAT T T C T TAG AAAAC T G G AAT AAG GAAAAT AAG AAAAACT T AAT T C C A 
CTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGC 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
AGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCCAGTCAAACACCACGTGCACTC 
AAAGAT GT AG AT G C AG CT AT TAT T AAT AAT AC AT AC AT T G AGC AAG C T AAT T T AAAAC C T T C AG AT G C TAT C T T T GT T G AGAAAT C 
AGATAAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTA 
T CT T GG AT GC T TAT C AC AC AG AT G AAGT G AAAAAAGT TAT C AAAGAT AC T T C AG C T GAT AT T C C AC AAT G G 

SEQ ID NO. 2905: SAG1641 FROM THE A909 GBS TYPE la STRAIN 

AATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTG 
GGATAAAATTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAG 
CCAATAAGGATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAATAAGAAAAACTTAATTCCA 
CTTGAAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGC 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
AGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTC 
AAAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATC 
AG AT AAAAAT T C AAAAC AAT GG ATT AAT AT C AT TGCGGG ACGT AAAAAT TGGAAAAAGC AAAAG AACGCTAAAGCT AT CC AAG CT A 
T C T T G GAT G C T TAT C AC AC AGAT G AAGT G AAAAAAGT TAT C AAAG AT AC T T C AG C T GAT AT T C C AC AAT GG 

SEQ ID NO. 2906: SAG1641 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

AAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTGAAAAGCTAGTAGGCGATA 
AAGCTAAAATCAAATTCACAGAATTTACAGATT AT ACAC AAC CAAATCAAGCGACAGCCAAT AAGGAT GT GGAT ATT AATGCCTTT 
C AAC AT TACAATTTCTT AG AAAACTGGAATAAGGAAAAT AAG AAAAACT T AAT TCC ACT TGAAAAGACTT ACT T AG CCCC AAT TCG 
TATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATGATGCAACAAATGGTAGCC 
GTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACAGTTGCTAATATCACATCT 
AAT AAAAAAG AT AT T AAT AT T C AG G AGT T AGAT G C GAG T C AAAC AC C AC GT G C AC T C AAAG AT G TAG AT GC AGCT AT TAT T AAT AA 
T AC AT AC AT T GAG C AAG C T AAT T T AAAAC C T T C AG AT G C T AT C T T T GT T G AG AAAT C AG AT AAAAAT T C AAAAC AAT G GAT T AAT A 
TCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGATGCTTATCACACAGATGAAGTG 
AAAAAAGT TAT C AAAG AT AC T T C AG C T GAT AT T C C AC AAT GGAA 

SEQ ID NO. 2907: SAG1641 FROM THE COHl GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AGT T T C AG C AAG C T C AACT T C AAG T AAAGT T GT T AAAGT T G GT GT T AT G AC C T T T T CT G AC AC T G AAAAAGC AC GTT GG G AT AAAA 
T T G AAAAGC T AGT AG GT G AT AAAG CT AAAAT C AAAT T T AC AG AAT T T AC AG AT TAT AC AC AAC C AAAT C AAG CG AC AG C C AAT AAG 
GATGTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAATAAGAAAAACTTAATTCCACTTGAAAA 
GACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAA 
ATGATGCAACAAATGGTAGCCGTGCATTGTATGTACTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCA 
AC AGT T G C T AAT AT C AC AT C T AAT AAAAAGG AT AT T AAT AT T C AG G AGT TAG AT G C GAGT C AAAC AC C AC GT G C AC T C AAAG AT GT 
AGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAA 
ATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTTGGAT 
G C T TAT C AC AC AG AT G AAGT G AAAAAAGT TAT C AAAG AT AC T T C AG CT GAT AT T C C AC AAT GG 

SEQ ID NO. 2908: SAG1641 FROM THE H36b GBS TYPE lb STRAIN 

AAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGAT 
AAAAT T G AAAAG C TAG TAG G T GAT AAAGC T AAAAT C AAAT T T AC AG AAT T T AC AG AT TAT AC AC AAC C AAAT C AAG CG AC AGC C AA 
T AAG GAT G T G GAT AT T AAT G C C T T T C AAC AT T AC AAT T T C T TAG AAAAC T G G AAT AAG GAAAAT AAG AAAAAC T T AAT T C C ACT T G 
AAAAGACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATT 
C CAAATG ATGC AACAAATGG T AGCCGTGC ATT GT AT GTCCTTC AGT C AGC AGGT TT AAT C AAAT TGAATGTTTCTGGT AAG AAGGT 
T G C AAC AGT T GC T AAT AT C AC AT C T AAT AAAAAG GAT AT T AAT AT T C AG GAGT T AGAT G C GAGT C AAAC AC C AC GT G C AC T C AAAG 
ATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGAT 
AAAAATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTT 
GGAT G C T TAT C AC AC AG AT G AAG T G AAAAAAGT T AT C AAAG AT ACT T C AG CT GAT AT T C C AC AAT GG 

SEQ ID NO. 2909: SAG1641 FROM THE JM3190013 GBS TYPE VIII STRAIN 

TTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGATAAAATTG 
AAAAG C T AG T AG GT GAT AAAG CT AAAAT C AAAT T T AC AG AAT T T AC AG AT T AT AC AC AAC C AAAT C AAG C G AC AG C C AAT AAG GAT 
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GTGGATATTAATGCCTTTCAACATTACAATTTCTTAGAAAACTGGAATAAGGAAAATAAGAAAAACTTAATTCCACTTGAAAAGAC 
TTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAAATG 
ATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCAACA 
GTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGTAGA 
T GC AG C TAT TAT T AAT AAT AC AT AC AT T GAG CAAG C T AAT T T AAAAC C T T C AGAT G C TAT CT T T GT T GAGAAAT C AGAT AAAAAT T 
C AAAAC AAT G GAT T AAT AT CAT T G C GGGAC G T AAAAAT T GG AAAAAG C AAAAGAACGC T AAAG C T AT C C AAG CT AT C T T GGAT GCT 
TAT C AC AC AG AT G AAG T GAAAAAAGT TAT C AAAG AT AC T T C AG C T GAT AT T C C AC AAT GG 

SEQ ID NO. 2910: SAG1641 FROM THE M732 GBS TYPE III STRAIN 

AATCAAGAAGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTG 
GG AT AAAAT T GAAAAG C TAG TAG GT GAT AAAG CTAAAAT C AAAT T T AC AG AAT T T AC AG AT TAT AC AC AAC C AAAT C AAGCG AC AG 
C C AAT AAGGAT GT GGAT AT T AAT G C C T T T C AAC AT T AC AAT T T CT T AGAAAAC T G GAAT AAG G AAAAT AAG AAAAACT T AAT T CCA 
CTTGAAAAGACTTACTTAGCTCCAATTCGT AT CTATTCTGAGT^GGT AAAAT CTCTTAAAAAATTGAAAAAAGGAGCCACTATTGC 
AATTCCAAATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGA 
AGGTTGCAACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTC 
AAAGATGTAGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATC 
AG AT AAAAAT T C AAAAC AAT GGAT T AAT AT CAT T G C GG GACGT AAAAAT T GG AAAAAG C AAAAGAACG C T AAAG CT AT C C AAG CT A 
TCTT GGAT GCTT AT CACACAGATGAAGT GAAAAAAGTTAT CAAAGAT AC 

SEQ ID NO. 29X1: SAG1641 FROM THE M781 GBS TYPE III STRAIN 

AGTTTCAGCAAGCTCAACTTCAAGTAAAGTTGTTAAAGTTGGTGTTATGACCTTTTCTGACACTGAAAAAGCACGTTGGGATAAAA 
TTGAAAAGCTAGTAGGTGATAAAGCTAAAATCAAATTTACAGAATTTACAGATTATACACAACCAAATCAAGCGACAGCCAATAAG 
G AT GT GG AT AT T AAT G C C T T T C AAC AT T AC AAT T T C T TAG AAAAC T G GAAT AAGG AAAAT AAG AAAAAC T T AAT T C C ACT T GAAAA 
GACTTACTTAGCTCCAATTCGTATCTATTCTGAGAAGGTAAAATCTCTTAAAAAATTGAAAAAAGGAGCCACTATTGCAATTCCAA 
ATGATGCAACAAATGGTAGCCGTGCATTGTATGTCCTTCAGTCAGCAGGTTTAATCAAATTGAATGTTTCTGGTAAGAAGGTTGCA 
ACAGTTGCTAATATCACATCTAATAAAAAGGATATTAATATTCAGGAGTTAGATGCGAGTCAAACACCACGTGCACTCAAAGATGT 
AGATGCAGCTATTATTAATAATACATACATTGAGCAAGCTAATTTAAAACCTTCAGATGCTATCTTTGTTGAGAAATCAGATAAAA 
ATTCAAAACAATGGATTAATATCATTGCGGGACGTAAAAATTGGAAAAAGCAAAAGAACGCTAAAGCTATCCAAGCTATCTGGGAT 
G CT T AT C AC AC AG AT GAAG T GAAAAAAGT TAT CAAAGAT AC T T C AG C T GAT AT T C C AC AAT G G 

>SEQ ID NO 2950: 35_090 frame: 1 

NQEVSASSTSSKVVKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANK 
DVD INAFQH YN FLENWNKENKKNL I PLEKT YLAP IR I Y SEKVKS LKKLKKGAT I AI PN DA 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQT PRALKDVDAAII 
NNTYIEQANLKPSDAIFVEKSDKNSKQWINIIAGRKNWKKQPCNAKAIQAILDAYHTDEVK 
KVIKDT SAD I PQWNPAFLY 

>SEQ ID NO 2951: 35_1169NT frame: 3 

QEVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKD 
VDINAFQHYNFLENWNKENKKNLI PLEKT YLAP IRIYSEKVKSLKKLKKGATIAIPNDAT 
NGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIIN 
NTYIEQANLKPSDAI FVEKSDKNSKQWINIIAGRKNWKKQKNAKAIQAILDAYHTDEVKK 
VIKDTSADIPQW 

>SEQ ID NO 2952: 35_18RS21 frame: 1 

NQEVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANK 
DVD INAFQH YNFLENWNKENKKNLI PLEKT YLAP IRIYSEKVKS LKKLKKGAT I AIPN DA 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
NNT Y I E QANLKP S DAI FVEKS DKN SKQW IN 1 1 AGRKNWKKQKN AKAI Q AI L DAYHT DE VK 
KVIKDTSADIP 

>SEQ ID NO 2953 : 35_2 603 frame: 1 

NQEVS AS STS SKVVKVGVMT FS DTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANK 
DVD INAFQH YN FLENWNKENKKNL I PLEKT YLAP IRIY SEKVKS LKKLKKGAT I AI PN DA 
TNGSRALYVLQSAGLIKLNVSGKBCVATVANITSNKKDINIQELDASQTPRALKDVDAA.il 
NNT YIEQANLKPS DAI FVEKSDKNSKQWINII AGRKNWKKQKN AKAI QAILDAYHTDEVK 
KVIKDT SAD I PQW 

>SEQ ID NO 2954:35_A909 frame: 1 

NQEVSAS STS SKVVKVGVMT FSDTEKARWDKIEKLVGDKAKIKFTE FTDYTQPNQATANK 
DVDINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDA 
TNGSRALYVLQSAGLIKLNVSGKKVAT VAN IT SNKKD IN I QELDASQT PRALKDVDAAII 
NNTYIEQANLKPSDAIFVEKSDKNSKQWINIIAGRKNWKKQKNAKAIQAILDAYHTDEVK 
KVIKDT SADI PQW 
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>SEQ ID NO 2955:35_CJB110 frame: 2 

SKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVDINAFQHY 
NFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDATNGSRALYVL 
QSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNTYIEQANL 
KPSDAIFVEKSDKNSKQWINIIAGRKNWKKQKNAKAIQAILDAYHTDEVKKVIKDTSADI 
PQW 

>SEQ ID NO 2956:35_COHX frame: 2 

VSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVD 
INAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAI PNDATNG 
SRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNT 
YIEQANLKP S DAI FVEKS DKN SKQWIN 1 I AGRKNWKKQKNAKAIQAI LDAYHT DE VKKVI 
KDTSADIPQW 

>SEQ ID NO 2957:35_H36B frame: 3 

EVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDV 
DINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAIPNDATN 
GSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINN 
TYIEQANLKPSDAI FVEKS DKN SKQWIN 1 1 AGRKNWKKQKNAKAIQAILDAYHTDEVKKV 
IKDTSADIPQW 

>SEQ ID NO 2958:35_JM9130013 frame: 2 

SASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANKDVDI 
NAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGATIAI PNDATNGS 
RALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNTY 
IEQANLKPSDAI FVEKS DKNSKQW INI IAGRKNWKKQKNAKAIQAILDAYHTDEVKKVIK 
DTSADIPQW 

>SEQ ID NO 2959:35_M732 frame: 1 

NQEVSASSTSSKVVKVGVMTFSDTEKARWDKIEKLVGDKAKIKFTEFTDYTQPNQATANK 
DVDINAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGAT IAI PNDA 
TNGSRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAII 
NNT Y IEQANLKPS DAI FVEKS DKN SKQWIN 1 1 AGRKNWKKQKNAKAI QAI LDAYHT DEVK 
KVIKD 

>SEQ ID NO 2960:35_M781 frame: 2 

VSAS STS SKWKVGVMT FS DTEKARWDKIEKLVGDKAKIKFTE FT DYTQPNQATANKDVD 
INAFQHYNFLENWNKENKKNLIPLEKTYLAPIRIYSEKVKSLKKLKKGAT IAI PNDATNG 
SRALYVLQSAGLIKLNVSGKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNT 
Y I EQ AN LKP S DA I FVE K S D KN S KQW I N 1 1 AGRKN W KKQKN AKAI QAI W D A YH T DE VKKV I 
KDTSADIPQW 

SEQ ID NO. 3001: SAG2147 FROM THE 1169NT1 GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAG T TACT AC T G AAT C T T T GT C AAAAG C AG AT AAAGT T C G C G T AGCC 
AAAAAAT C AAAAAT G AC T AAG G C G AC AT C T AAAT C AAAAG T AG AAG AT GT AAAAC AGG C T 
CCAAAACCTTCTCAGGCATCTAATGAAGTCCCAAAATCAAGTTCTCAATCTACAGAAGCT 
AAT T CT C AG C AAC AAGT TACT G C GAG T G AAG AG GC G G C T GT AG AAC AAG C AGT T G T AAC A 
G AAAAT AC CCCTGCTAC C AGT C AG G C AC AAC AAAC T TAT G CT G T T ACT G AG AC AAC T T AC 
AAAC C T GCT C AAC AC C AG AC AAG T GG C C AAGT AT T G AG C AAT G G AAAT AC T G C AG G G G C G 
GTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGG 
GAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCT 
TCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAGTT 
AATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 3002: SAG2147 FROM THE 18RS21 GBS TYPE II STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTT CACAAGT TACT ACT GAAT CTTT GT CAAAAGCAGAT AAAGTT C 
GCGT AGCCAAAAAAT C AAAAATGACT AAGGCG AC AT CT AAAT CAAAAGT AGAAGAT GT AA 
AACAGGCT CC AAAACCTT CT C AGG CAT CT AATGAAGCC CC AAAAT C AAGTT CT CAAT CT A 
C AG AAG CT AAT T CT C AG C AAC AAGT T ACT G C GAG T G AAG AG G C AG C T G TAG AAC AAG C AG 
T T GT AAC AG AAAAC AC CC C T G CT AC C AGT C AG G C AC AAC AAG CT TAT G C T GT TACT GAGA 
C AACT T AT AG AC C T G C T C AAC AC C AG AC GAG T GG C C AAGT AT T GAG T AAT GG AAAT AC T G 
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CAGGGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGT 
CTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCT 
CAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGG 
ATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 3003: SAG2147 FROM THE 2603 V/R GBS TYPE V STRAIN 
(REVERSE COMPLEMENT) 

AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGT 

T C G C GT AGC C AAAAAAT C AAAAAT GAC T AAGG C GAC AT CT AAAT C AAAAGT AGAAGAT G T 

AAAAC AG G C T C C AAAAC CT T C T C AGGC AT C T AAT G AAG C C C C AAAAT C AAGT T C T C AAT C 

TACAGAAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGC 

AGTTGTAACAGAAAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGA 

GACAACTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATAC 

TGCAGGGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCA 

GT C T ACT T GG GAAC AT AT T AT T G C C C GT G AAT C AAAT G GT AAT C C T AAT GT T G CT AAT GC 

C T C AG GAG C T T C AG G ACT T T T C C AAAC GAT G C C AGGT T GGGGT T C AAC AG C T AC AGT T C A 

GGATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTA 

C 

SEQ ID NO. 3004: SAG2147 FROM THE 090 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

TAG C C AAAAAAT C AAAAAT GAT T AAG G CG AC AT C T AAAT C AAAAG T AG AAG AT G T AAAAC 
AGGC T C C AAAACC T T C T C AG G CAT CT AAT G AAGC C C C AAAAT C AAGT T C T C AAT C T AC AG 
AAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGTAGAACAAGCAGTTG 
TAACAGAAAACACCCCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAA 
CTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATACTGCAG 
GGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTA 
CTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAG 
G AGCT T C AG GAC T T T T C C AAACG AT G C C AG GT T G G GG T T C AAC AG CT AC AGT T C AGG A 

SEQ ID NO. 3005: SAG2147 FROM THE A909 GBS TYPE la STRAIN 
(REVERSE COMPLEMENT) 

AAGGCGACATCTAAATCAAAAGTAGAAGATGTAAAACAGGCTCCAAAACCTTCTCAGGCA 
T C T AAT G AAG C C C C AAAAT C AAGT T CT C AAT C T AC AG AAGC T AAT T C T C AG C AAC AAGT T 
AC T GC G AGT GAAGAG G C AG CT GT AG AAC AAG C AGT T GT AAC AG AAAAC AC C C C T G CT ACC 
AGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCTCAACACCAG 
AC AAGT GG C C AAGT AT T GAGT AAT G GAAAT AC T G C AGG GG C TAT T GG CT C AG C AG CT G C A 
GCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGT 
G AAT C AAAT GGT AAT C CT AAT GT T G C T AAT G C CT C AGG AG C T T C AGG AC T T T T C C AAACG 
AT G C C AG GTTGGGGTT C AAC AG CT AC AGT T C AG AAT C AAGT T AAT T C AG C TAT T AAAG C T 
TATCGTGCTCAAGGTTTATCA 

SEQ ID NO. 3006: SAG2147 FROM THE CJB110 GBS NONTYPEABLE STRAIN 
(REVERSE COMPLEMENT) 

AAT CTT TGT C AAAAG C AG AT AAAGTTCGCGTAGCC AAAAAAT CAAAAATG ACT AAGGCG A 
CAT CTAAAT CAAAAGTAG AAGAT GT AAAACAGGCT CCAAAACCTT CT CAGGCAT CT AATG 
AAG C C C C AAAAT C AAGT T C T C AAT C T AC AG AAG C T AAT T C T C AG C AAC AAGT T AC T G C G A 
GTGAAGAGGCAGCTGTAGAACAAGCAGTTGTAACAGAAAACACCCCTGCTACCAGTCAGG 
C AC AAC AAG C T T AT G CT GT T AC T GAG AC AAC T T AT AGAC C T G C T C AAC AC C AG ACG AGT G 
G C C AAGT AT T GAGT AAT G GAAAT AC T G C AGG G G C T AT T GG C T C AG C AG C T G C AG C AC AAA 
TGGCTGCT GC AAC AG GAGT C C CT C AGT CT AC T T GGG AAC AT AT TAT T G C C C GT G AAT C AA 
ATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAG 
GTTGGGGTT C AAC AG C T AC AGT T C AG GAT C AAG T T AAT T C AG CT AT T AAAG C T T AT CGT G 
CT C AAGGT T TAT CAGCTT GGGGTT AC 

SEQ ID NO. 3007: SAG2147 FROM THE COHl GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAGT T AC T AC T G AAT C T T T GT C AAAAG C AG AT AA 

AGT T C G C GT AG C C AAAAAAT C AAAAAT G ACT AAG G C GAC AT C T AAAT C AAAAG TAG AAG A 
T GT AAAAC AGGCT CC AAAAC CTT CT C AGGC AT CT AAT GAAGCCCC AAAAT CAAGTTCTC A 
AT C T AC AG AAG C T AAT T C T C AG C AAC AAGT T AC T G C GAGT GAAGAG GC GG C T GT AG AAC A 
AG C AGT T G T AAC AG AAAAT AC CCCTGCTAC C AGT C AGG C AC AAC AAAC T TAT G C T GT T AC 
T G AG AC AACT T AC AAAC C T G CT C AAC AC C AG AC AAGT GG C C AAGT AT T G AG C AAT G G AAA 
TACTGCAGGGGCGGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCC 
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TCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAA 
TGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGT 
TCAGGATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGG 
TTAC 

SEQ ID NO. 3008: SAG2X47 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAGT T AC TACT G AAT CT T T GT C AAAAG C 

AG AT AAAGT T C G C GT AG C C AAAAAAT C AAAAAT G AC T AAG G C G AC AT CT AAAT C AAAAGT 
AG AAG AT GT AAAAC AGGCT CC AAAAC CT T CTC AGG C AT CT AAT G AAG C C C C AAAAT C AAG 
TTCTCAATCTACAGAAGCTAATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCAGCTGT 
AG AAC AAG C AGT T GT AAC AGAAAAC AC CC C T G C T AC C AGT C AGG C AC AAC AAG C T TAT G C 
T GT TAG T G AG AC AAC T TAT AG AC C T G CT C AAC AC C AG AC AAGT G G C C AAGT AT T GAG T AA 
T G G AAAT ACT G C AG GGG C T AT T G G CT C AG C AGC T GC AG C AC AAAT GGC T G CT GC AAC AGG 
AGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGT 
T GC T AAT GC CT C AGGAG C T T C AGG AC T T T T C C AAAC GAT G C C AGGT T GGGGT T C AAC AG C 
TACAGTTCAGGATCAAGTTAATTCAGCTATTAAAGCTT 

SEQ ID NO. 3009: SAG2147 FROM THE M732 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

AAAAGT T C AC AAGT TAG T AC T G AAT C T T T GT C AAAAGC AG AT AAAG T T C G CGT AGC 

C AAAAAAT C AAAAAT G AC T AAGG C GAC AT C T AAAT C AAAAGT AG AAG AT GT AAAAC AG G C 

T C CAAAACCTT CT CAGGC AT CT AATG AAGC CCC AAAAT CAAGTT CT CAAT CT AC AGAAGC 

T AAT T CT C AG C AAC AAG T T ACT G C GAG T G AAG AG GC GG C T GT AG AAC AAG C AGT T GT AAC 

AGAAAATACCCCTGCTACCAGTCAGGCACAACAAACTTATGCTGTTACTGAGACAACTTA 

C AAAC CT G C T C AAC AC C AGAC AAGTG G CC AAGT AT T G AGC AAT GG AAAT AC T G C AGGGG C 

GGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTG 

GGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGC 

TTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGATCAAGT 

TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTA 

SEQ ID NO. 3010: SAG2147 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GTAACCCCAAGCTGATAAACCTTGAGCACGATAAGCTTTAATAGCTGAATTAACTTGATC 
CTGAACTGTAGCTGTTGAACCCCAACCTGGCATCGTTTGGAAAAGTCCTGAAGCTCCTGA 
GGC AT TAG C AAC AT TAG GAT T AC CAT T T GAT T C AC GGG CAAT AAT AT GT T C C C AAGT AG A 
CTGAGGGACTCCTGTTGCAGCAGCCATTTGTGCTGCAGCAGCAGATCCGACCGCCCCTGC 
AGTATTTCCATTGCTCAATACTTGGCCACTTGTCTGGTGTTGAGCAGGTTTGTAAGTTGT 
CT C AGT AAC AG CAT AAG TTTGTTGTGCCT G ACT G GT AG C AG G GGT AT T T T C T GT T AC AAC 
TGCTTGTTCTACAGCCGCCTCTTCACTCGCAGTAACTTGTTGCTGAGAATTAGCTTCTGT 
AGATTGAGAACTTGATTTTGGGGCTTCATTAGATGCCTGAGAAGGTTTTGGAGCCTGTTT 
TACATCTTCTACTTTTGATTTAGATGTCGCCTTAGTCATTTTTGATTTTTTGGCTACGCG 
AACTTTATCTGCTTTTGACAAAGA 

>SEQ ID NO 3050: 25_1169NT frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEVPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3051:25_18RS21 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVE DVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3052:25_2603 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAVVTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
G S AAAAQMAAATG V PQ S T WEH 1 1 ARE SNGN PN V ANAS GAS GL FQTMPG WG S T AT VQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3053:25__090 frame: 3 

AKKSKMIKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAVV 
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TENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQST 
WEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQ 

>SEQ ID NO 3054:25_A909 frame: 1 

KATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAWTENTPAT 
SQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQSTWEHIIAR 
ESNGNPNVANASGASGLFQTMPGWGSTATVQNQVNSAIKAYRAQGLS 

>SEQ ID NO 3055:25_CJB110 frame: 3 

SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTAS 
EE AAVE QAWT ENT PAT S QAQQAYAVTETT YRPAQHQT SGQVL SNGNT AGAI G S AAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 

>SEQ ID NO 3056:25_COH1 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

>SEQ ID NO 3057:25_H36B frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENT PAT S QAQQAYAVTETT YRPAQHQT SGQVL SNGNT AGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKA 

>SEQ ID NO 3058:25_M732 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENT PAT SQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWG 

>SEQ ID NO 3059:25_M781 frame: 4 

SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTAS 
EE AAVEQAWTENT PAT SQAQQTYAVTETTYKPAQHQT SGQVL SNGNT AGAVGS AAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 

SEQ ID NO. 3101: SAG2148 FROM THE 1169NT1 6BS TYPE V STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
T AGT AT C AGT AACG C T GAT GT CAT C AGT AT AG GT G AT GT T T T AAAAT T G G AT AAT T C TAG AG C T AGT C AAG C AGAAG C AAAAT C T C 
AACCAACAATT GAAAATT CAAT GAATT CTT CATCAAATT T GAGT T C AAGTGATT C AGCTGCAAAAGAAGAAAT AGCT CGTCGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3102: SAG2148 FROM THE 18RS21 GBS TYPE II STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AACCAACAAT TGAAAATT CAAT GAATT CTT CAT CAAATTTGAGT T CAAGTGAT T CAGCCGCAAAAGAAGAAAT AGCT CGT CGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3103: SAG2148 FROM THE 2603 V/R GBS TYPE V STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AACCAACAAT TGAAAATT CAAT GAATT CTT CAT CAAATTTGAGTT CAAGTGAT TCAGCCGCAAAAGAAGAAATAGCT CGT CGTGAA 
T CAAAT GGT AGTT AT ACTGC AC AGAAT GGACAAT ATT AT GGAAGAT AT C AACTGT CT C AAT CTT ACCT AAAT GG CGACTT ATCT C C 
T G AAAAT C AAG AAAAAGT AG CG G AC AAT T AT GT GGT T T C T CGT T AC G GAT CTTGGTCG G C AG C G CT AT CAT T T T GG AAT AGT AACG 
GCTGGTAT 

SEQ ID NO. 3104: SAG2148 FROM THE 090 GBS TYPE la STRAIN 
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GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTAAAGCTAGTCAAGCAGAAGCAAAATCTC 
AACCAACAATTGAAAATTCAATGAATTCTTCATCAAATTTGAGTTCAAGTGATTCAGCCGCAAAAGAAGAAATAGCTCGTCGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3105: SAG2148 FROM THE A909 GBS TYPE la STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC C AAC AAT T GAAAAT T C AAT G AAT T C T T C ATC AAAT T T GAG T T C AAG T GAT T C AGC CGC AAAAG AAG AAAT AGC T C GT CGT G AA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3106: SAG2148 FROM THE CJB110 GBS NONT Y PE ABLE STRAIN 

G CAT CT TAT AC CGT G AAAT C AG GT GAT AC CT TAT C AG CT AT T GC T AAAAAT C AT AAAACT ACGGT AC AAG AGT T AGT GT CT C T C AA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTAAAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AATT GAAAAT T C AAT G AAT T C T T CAT C AAAT T T G AGT T C AAGT GAT T C AGC C G C AAAAG AAG AAAT AG CT C GT C GTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATAT CAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGTTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3107: SAG2148 FROM THE COHl GBS TYPE III STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAATAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AACC AAC AATTGAAAATTC AAT G AATT CTTC AT CAAATTTGAGTTCAAGTGATTCAGCTGCAAAAGAAG AAAT AGCT CGT CGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3108: SAG2148 FROM THE H36b GBS TYPE lb STRAIN 
(REVERSE COMPLEMENT) 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AAT T GAAAAT T C AAT G AAT T CT T CAT C AAAT T T G AGT T C AAGT GAT T C AG C C G C AAAAG AAGAAAT AG C T CG T C GT G AA 
T C AAAT G GT AGT T AT AC T G C AC AG AAT GG AC AAT AT TAT G G AAG AT AT C AACT GT C T C AAT C T T AC CT AAAT GG C G ACT TAT C T C C 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3109: SAG2148 FROM THE JM9130013 GBS TYPE VIII STRAIN 
(REVERSE COMPLEMENT) 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAAGAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGACGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAACTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AATTGAAAATTC AAT GAATT CTTC AT CAAATTTG AGT TC AAGT GAT TC AGC CGC AAAAG AAGAAAT AGCT CGTCGTGAA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
T GAAAAT C AAG AAAAAGT AG C GG AC AAT T AT GT GGCTTCTCG T T AC G GAT CTTGGTCGG C AG CGC TAT CAT T T T GG AAT AGT AAC G 
GCTGGTAT 

SEQ ID NO. 3110: SAG2148 FROM THE M732 GBS TYPE III STRAIN 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACTACGGTACAATAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AAT T GAAAAT T C AAT G AAT T CT T CAT C AAAT T T GAG T T C AAG T GAT T C AG CT G C AAAAG AAGAAAT AG CT C GT C GT G AA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
TGAAAATCAAGAAAAAGTAGCGGACAATTATGTGGCTTCTCGTTACGGATCTTGGTCGGCAGCGCTATCATTTTGGAATAGTAACG 
GCTGGTAT 

SEQ ID NO. 3111: SAG2148 FROM THE M781 GBS TYPE III STRAIN 
(REVERSE COMPLEMENT) 

GCATCTTATACCGTGAAATCAGGTGATACCTTATCAGCTATTGCTAAAAATCATAAAACT ACGGT ACAATAGTTAGTGTCTCTCAA 
TAGTATCAGTAACGCTGATGTCATCAGTATAGGTGATGTTTTAAAATTGGATAATTCTACAGCTAGTCAAGCAGAAGCAAAATCTC 
AAC CAAC AAT T GAAAAT T C AAT G AAT T CT T CAT C AAAT T T GAG T T C AAG T GAT T C AG CT G C AAAAG AAG AAAT AG CTCGTCGT G AA 
TCAAATGGTAGTTATACTGCACAGAATGGACAATATTATGGAAGATATCAACTGTCTCAATCTTACCTAAATGGCGACTTATCTCC 
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T G AAAAT C AAG AAAAAGT AG CGGAC AAT T AT GT GG CT T C T C GT T AC GG AT CT T G GT C GG C AGC G C TAT CAT T T T GG AAT AGT AAC G 
GCTGGTAT 

>SEQ ID NO 3150:15_1169NT frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3151:15_18RS21 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVVSRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3152:15_2603 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQSYLNGDLSPENQEK 
VADNYVVSRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3153:15_090 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSKASQAEAKSQPT 
IENSMNS S SNLS S SDSAAKEE I ARRE SNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVVSRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3154:15_A909 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNS S SNLS S SDSAAKEE IARRESNGSYTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3155:15_CJB110 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSKASQAEAKSQPT 
IENSMNSSSNLS S S DSAAKEEIARRE SNGSYTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVVSRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3156:15_COHl frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQ . LV S LN S I SN AD V I S I G D VLKL DN S T AS QAE AKS Q PT 
IENSMNS S SNLS S SDSAAKEE IARRESNGSYTAQNGQYYGRYQLSQSYLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3157:15_H36B frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3158:15_JM9130013 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQELVSLNSISNADVISIGDVLKLDNSTTSQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3159:15_M732 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQ . LVSLNS I SNADVI S IGDVLKLDNSTASQAEAKSQPT 
IENSMNS S SNLS S SDSAAKEE IARRESNGSYTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

>SEQ ID NO 3160 : 15_M781 frame: 1 

ASYTVKSGDTLSAIAKNHKTTVQ . LVSLNS I SNADVIS IGDVLKLDNSTASQAEAKSQPT 
IENSMNSSSNLSSSDSAAKEEIARRESNGS YTAQNGQYYGRYQLSQS YLNGDLS PENQEK 
VADNYVASRYGSWSAALS FWNSNGWY 

SEQ ID NO 4001 : SAG0653 FROM THE 2603 V/R GBS TYPE V STRAIN 

AT G AAG AAAG T GT TAG T G AG T AGT CTTTTGGTTT T AGG G AT T AC GAT A 
ACGTTACAAACAGTAGTTGAGGCTAAGGGGCCAAAAGTAGCTTATACACAAGAGGGAATG 
ACTGCTCTTT CG G AC AC AAAT AAAG AT AAAGT C AC TAG TAT T T C T AT T G AC GAG AT T C AA 
AAAAGCT T AGAAGGT AAG AAGC CG AT TACT G T T AGT T T T GAT AT T GAT GAT AC ACT G C T T 
TTCAGTAGTCAATATTTTCAATATGGTAAAGAATATGTAACTCCTGGATCGTTTGATTTT 
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CT T CAT AAAC AAAAAT T C T G G GAT CT T GT T G C AAAAC GAG GAG AT C AAGAT T C CAT T C C C 
AAAG AAT AT GCT AAAAAAT T AAT T G CT AT G CAT C AAAAAC G AGG AG AT AAAAT T GT T T T T 
ATAACAGGTAGGACAAGAGGGTCAATGTATAAGGAGGGCGAGGTTGATAAAACAGCTAAA 
G C C T T AG C T AAAG AT T T T AAAT T AG AC AAAC C AAT T G CT GT AAAT T AT AC AG GC GAT AAA 
CCT AAAAAGCC AT ACAAAT AT GAT AAAT CAT AT TATATT AAGAAATAT GGTT CAGAC ATT 
CAT TAT GGAGAT AGT GAT GAC GAT AT T CAT G C AG C T AGG GAG G C C G GT GCT AG AC C AAT T 
AGAATTTTAAGAGCACCTAATTCTACAAATCTACCTTTACCAGAAGCTGGAGGCTACGGT 
GAAGAGGT T CT CGAAAAT T C AGCT T AC 

SEQ ID NO 4002 : SA60653 FROM THE 090 GBS TYPE III STRAIN 

AAGG GG C C AAAAGT AG C T T AT AC AC AAG AG GGAAT GAC 
TGCTCTTTCG GAC AC AAAT AAAG AT AAAGT C AC T AC TAT T T CT AT T GAC G 
AGAT T C AAAAAAG C T T AGAAGGT AAG AAG C CG AT T AC T GT T AG T T T T GAT 
ATT GAT GAT ACACTACTTTTC AGT AGT CAAT AT TTT CAAT ATGGTAAAGA 
ATATGTAACTCCTGGATCGTTTGATTTTCTTCATAAACAAAAATTCTGGG 
AT CT T GT T G C AAAAC G AGG AG AT C AAG AT T C CAT T C C C AAAG AAT AT GCT 
AAAAAAT T AATT GCT ATGC AT CAAAAACGAGG AG AT AAAATTGTTTT TAT 
AAC AGGT AGGAC AAG AGG GT CAAT G TAT AAG G AGGG C G AG GT T G AT AAAA 
C AG C T AAAG CCT TAG CT AAAGAT T T T AAAT TAG AC AAAC CAAT T G CT GT A 
AAT T AT AC AGG CG AT AAAC CT AAAAAG C CAT AC AAAT ATGAT AAAT CAT A 
T TAT AT T AAG AAAT AT GGTT CAGAC AT T CAT TAT G GAG AT AGT GAT G AC G 
AT ATT C ATGC AGCT AGGGAGGCCGGT GCT AGACCAATTAGAATTTTAAGA 
GCACCT AAT TCT ACAAAT CTACCTTTACCAGAAGCTGGAGGCTACGGTGA 
AGAGGTTCTCGAAAATTCAGCTTAC 

SEQ ID NO 4003 : SAG0653 FROM THE A909 GBS TYPE la STRAIN 

AAG GGGCC AAAAGT AGCT TAT AC AC A 

AGAGGGAATGACTGCTCTTTCGGACACAAATAAAGATAAAGTCACTACTA 
T T T C TAT T GAC GAG AT T C AAAAAAGC T T AG AAGGT AAG AAG C C GAT T AC T 
GTTAGTTTTGATATTGATGATACACTGCTTTTCAGTAGTCAATATTTTCA 
ATATGGTAAAGAATATGTAACTCCTGGATCGTTTGATTTTCTTCATAAAC 
AAAAATTCTGGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCC 
AAAGAAT AT G C T AAAAAAT T AAT T G CT AT G CAT C AAAAACG AGGAG AT AA 
AAT T GT T T T TAT AAC AGGT AGG AC AAG AGG GT C AAT GT AT AAGG AG GG CG 
AG GTTGAT AAAAC AGCT AAAG CCTT AG CTAAAG ATT TT AAAT TAGACAAA 
C C AAT T G C T GT AAAT TAT AC AG G C GAT AAAC C T AAAAAG C CAT AC AAAT A 
T GAT AAAT CAT AT TAT AT T AAGAAATAT GGT T CAGAC AT T CAT TAT GG AG 
ATAGTGATGACGATATTCATGCAGCTAGGGAGGCCGGTGCTAGACCAATT 
AG AAT TT T AAG AG C AC CT AAT T CT AC AAAT CT AC CT T T AC C AG AAG C T GG 
AGGCTACGGTGAAGAGGTTCTCGAAAATTCAGCTTAC 

SEQ ID NO 4004 : SAG0653 FROM THE 18RS21 GBS TYPE II STRAIN 

AAGGGGCCAAAAGTAGCTTATACACAAGA 

G GGAAT G ACT G CT C T T T C G GAC AC AAAT AAAG AT AAAGT C AC TACT AT T T 
CT AT T GAC GAG AT T C AAAAAAG CT TAG AAG GT AAG AAG C CG AT T ACT GT T 
AGT T T T GAT AT T GAT GAT AC ACT GCT T T T C AGT AGT CAAT AT TTT CAAT A 
TGGTAAAGAATATGTAACTCCTGGATCGTTTGATTTTCTTCATAAACAAA 
AATTCTGGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCCAAA 
G AAT AT GC T AAAAAAT T AAT T G C T AT G CAT C AAAAACG AGGAGAT AAAAT 
TGTTTTTATAACAGGTAGGACAAGAGGGTCAATGTATAAGGAGGGCGAGG 
T T GAT AAAAC AGC T AAAG C C T T AGC T AAAG AT T T T AAAT TAGAC AAAC C A 
AT T G CT GT AAAT T AT AC AGG C GAT AAAC CT AAAAAG C CAT AC AAAT AT G A 
T AAAT CAT AT TAT ATT AAGAAATAT GGTT CAGAC ATT CAT TAT GGAGAT A 
G T GAT GAC GAT AT T CAT G C AG C T AGGGAGG C CG G T G CT AG AC CAAT TAG A 
AT T T T AAG AG C AC CT AAT T CT AC AAAT C T AC CT T T AC C AG AAG CT GG AG G 
C T ACG GT G AAG AG GT T C T C G AAAAT T C AG CT T AC 

SEQ ID NO 4005 : SAG0653 FROM THE M732 GBS TYPE III STRAIN 

AAGGGGCCAAAAGTAGCTTATACACAAGA 

GGG AAT G AC T GCTCTTTCG GAC AC AAAT AAAG AT AAAGT C AC T AC TAT T T 
CT AT T GAC GAG AT T C AAAAAAG C T TAG AAG G T AAG AAG C C GAT T AC T GT T 
AGT TTT GAT AT T GAT GAT AC ACT G C T T T T C AGT AG T CAAT AT TTT CAAT A 
T G GT AAAG AAT AT GT AAC T C C T GG AT C GT T T GAT T T T C T T CAT AAAC AAA 
AAT TCT G GG AT CT T GT T G C AAAAC GAG GAG AT C AAG AT T C C AT T C C C AAA 
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G AAT AT GCT AAAAAATT AATTGCT AT GCAT CAAAAACGAGGAGAT AAAAT 
T GT T T T T AT AAC AG GT AG G AC AAG AGGGT C AAT GT AT AAGG AGGGCG AGG 
TTGATAAAACAGCTAAAGCCTTAGCTAAAGATTTTAAATTAGACAAACCA 
AT T G CT GT AAAT T AT ACAGG CG AT AAAC C T AAAAAG C C AT AC AAAT AT GA 
TAAATCATATTATATTAAGAAATATGGTTCAGACATTCATTATGGAGATA 
GTGATGACGATATTCATGCAGCTAGGGAGGCCGGTGCTAGACCAATTAGA 
AT T T T AAGAG C ACC T AAT T C T AC AAAT C T AC C T T TAG CAGAAG C T GGAGG 
C T AC G GT G AAG AG GT T C T C GAAAAT T C AGC T T AC 

SEQ ID NO 4006 : SAG0653 FROM THE COH1 GBS TYPE III STRAIN 

AAG GGG C C AAAAGT AG CT T AT AC AC AAG AG GGAAT GAC T 
G CT C T T T C GG AC AC AAAT AAAG AT AAAGT C ACT ACT AT T T C TAT T G ACG A 
GAT T C AAAAAAG C T T AG AAGGT AAG AAG C C GAT T ACT GT T AGT T T T GAT A 
T T GAT GAT AC AC T G CT T T T C AGT AGT C AAT AT T T T C AAT AT G GT AAAG AA 
TATGTAACTCCTGGATCGTTTGATTTTCTTCATAAACAAAAATTCTGGGA 
T CT T GT T GC AAAACG AGGAG AT C AAG AT T CCAT T C CC AAAG AAT AT GCT A 
AAAAAT T AAT T G C TAT G CAT C AAAAAC G AGG AG AT AAAAT T GT T T T TATA 
AC AGGT AGG AC AAG AGGGT C AAT GT AT AAGGAGGGC GAGGT T GAT AAAAC 
AG C T AAAG C C T T AGC T AAAGAT T T T AAAT TAG AC AAAC C AAT T G C T GT AA 
AT TAT AC AGG C GAT AAAC CT AAAAAG C CAT AC AAAT AT GAT AAAT CAT AT 
T ATATT AAGAAATATGGTT CAGACATT CATTATGGAGAT AGTGAT GACGA 
TAT T C AT GC AG CT AGG G AGG C CGG T G CT AG AC C AAT T AGAAT T T T AAG AG 
C AC C T AAT T C T AC AAAT C T AC C T T T AC CAGAAG C T G G AGGC T AC G GT GAA 
GAGGT T CT C GAAAAT T C AG C T T AC 

SEQ ID NO 4007 : SAG0653 FROM THE M781 GBS TYPE III STRAIN 

AAGGGGCCAAAAGTAGCTTATACACA 

AG AG G G AAT G ACT G CT C T T T C G GAC AC AAAT AAAGAT AAAGT C AC T AC T A 
T T T C T AT T G ACGAG AT T C AAAAAAG C T T AGAAGGT AAG AAG C CG AT TACT 
GT T AGT T T T GAT AT T GAT GAT AC AC T GC T T T T C AGT AG T C AAT AT T T T C A 
AT AT G G T AAAG AAT AT GT AACT C CT GG AT CGT T T GAT T T T CT T CAT AAAC 
AAAAATTCTGGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCC 
AAAG AAT AT G CT AAAAAAT T AAT T GCT AT GCAT CAAAAACG AGGAG AT AA 
AAT T GT T T T TAT AAC AG G TAG GAC AAG AG G GT C AAT GT AT AAGG AG G G C G 
AGGT T GAT AAAACAGCT AAAGC CTT AGCT AAAGATTTT AAATT AGACAAA 
C C AAT T G CT G T AAAT TAT AC AG G C G AT AAAC CT AAAAAGC C AT AC AAAT A 
T GAT AAAT CAT AT TAT ATT AAG AAAT AT GGTT CAGACATT CAT TAT G GAG 
AT AG T GAT GAC GAT AT T CAT G C AG C T AGGG AGG C CGGT GCT AG AC C AAT T 
AGAAT T T T AAG AG C AC C T AAT T C T AC AAAT C T AC CTT T AC CAGAAG C T GG 
AGGCTACGGTGAAGAGGTTCTCGAAAATTCAGCTTAC 

SEQ ID NO 4008 : SAGO 653 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

AAGGGGCCAAAAGTAGCTTATACACAAGA 

G G G AAT GAC TGCT C T T T C GG AC AC AAAT AAAG AT AAAGT C AC TACT AT T T 
C T AT T G AC G AGAT T C AAAAAAG C T TAG AAG G T AAG AAGC C GAT T AC T G T T 
AGT T T T GAT AT T GAT GAT AC AC T G C T T T T C AGT AGT C AAT AT T T T C AAT A 
T GGT AAAGAAT ATGT AACT C CT GGAT CGT TT GAT TTT CTT CAT AAAC AAA 
AATTCTGGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCCAAA 
G AAT AT G C T AAAAAAT T AAT T G C TAT G CAT C AAAAAC GAG GAG AT AAAAT 
T GT T T T T AT AAC AGGT AGG AC AAG AGGGT C AAT GT AT AAGG AGGGCG AGG 
T T GAT AAAAC AG CT AAAG C C T T AG C T AAAG AT T T T AAAT TAG AC AAAC C A 
ATT GCT GT AAAT T AT AC AGG CG AT AAAC CT AAAAAGC CAT AC AAAT AT G A 
TAAATCATATTATATTAAGAAATATGGTTCAGACATTCATTATGGAGATA 
GTGATGACGATATTCATGCAGCTAGGGAGGCCGGTGCTAGACCAATTAGA 
AT T T T AAG AGC AC C T AAT T C T AC AAAT CT AC C T T T AC CAGAAG C T GGAGG 
CTACGGTGAAGAGGTTCTCGAAAATTCAGCTTAC 

SEQ ID NO 4009 : SAG0653 FROM THE JM9130013 GBS TYPE VIII STRAIN 

AAGGGGCCAAAAGTAGCTTATACACAAGAGGGAAT 

GACT GCT CTTTCGGACACAAAT AAAGAT AAAGT C ACT ACT ATTTCT AT TG 
AC GAG AT T C AAAAAAG CT TAG AAGG T AAG AAG C C GAT T AC T G T T AGT T T T 
GAT AT T GAT GAT AC AC T G C T T T T C AGT AGT C AAT AT TTT C AAT AT G G T AA 
AGAAT ATGTAACTCCTGGATCGTTTGATTTTCTTCATAAACAAAAATTCT 
GGGATCTTGTTGCAAAACGAGGAGATCAAGATTCCATTCCC AAAGAAT AT 
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GCT AAAAAAT T AAT T G C T AT G CAT C AAAAAC GAG G AGAT AAAAT T GT T T T 
TATAACAGGTAGGACAAGAGGGTCAATGTATAAGGAGGGCGAGGTTGATA 
AAAC AG C T AAAG C C T T AG C T AAAGAT T T T AAAT T AG AC AAAC C AAT T G C T 
GTAAATTATAC AGGCGATAAACCT AAAAAGC CATACAAAT AT GAT AAAT C 
AT AT T AT AT T AAGAAAT AT GGT T C AG AC AT T CAT T AT GGAGAT AGT GAT G 
ACG AT AT T CAT GC AGCT AGGGAGGCCGGT GCT AG ACC AAT T AG AAT T T T A 
AGAGCACCTAATTCTACAAATCTACCTTTACCAGAAGCTGGAGGCTACGG 
TGAAGAGGTT CTCGAAAATT CAGCT TAC 

SEQ ID NO 4010 : SAG0653 FROM THE 2 603 V/R GBS TYPE V STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYAKKLIAMHQKRGDKIVFITGRTRGS 
M YKE GE V DKT AKAL AKD FKL DK P I AVN Y T GDKPKK P YK Y DK S Y Y I KK Y G S D I HY G D S DD D 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4011 : SAG0653 FROM THE 090 GBS TYPE III STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVTPGSFDFLHKQKFWDLVAKRGDQDSIPKEYAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4012 : SAG0653 FROM THE A909 GBS TYPE la STRAIN 

KGPKVAYTQEGMTALS DTNKDKVTTIS IDE I QKS LEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVTPGS FDFLHKQKFWDLVAKRGDQDSIPKEYAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4013 : SAG0653 FROM THE 18RS21 GBS TYPE II STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTISIDEIQKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKE YVT PGS FDFLHKQKFWDLVAKRGDQDS I PKE YAKKLI AMHQKRG DKI VFI TGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4014 : SAG0653 FROM THE COH1 GBS TYPE III STRAIN 

KGPKVAYTQEGMTALS DTNKDKVTT I S I DEIQKSLEGKKPI TVS FDI DDTLLFS S QYFQY 
GKE YVT PGS FD FLHKQKFW DLVAKRGDQD S I PKE YAKKLI AMHQKRGDKI VFI TGRTRG S 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEEVLENSAY 

SEQ ID NO 4015 : SAG0653 FROM THE M781 GBS TYPE III STRAIN 

KGPKVAYTQEGMTALS DTNKDKVTTIS I DEIQKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVT PGS FDFLHKQKFWDLVAKRGDQDS I PKEYAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
I HAAREAGARP IRI LRAPNSTNLPL PEAGGYGEE VLENS AY 

SEQ ID NO 4016 : SAG0653 FROM THE CJB110 GBS NONTYPEABLE STRAIN 

KGPKVAYTQEGMTALS DTNKDKVTTIS I DEI QKSLEGKKPITVSFDIDDTLLFSSQYFQY 
GKEYVT PGS FDFLHKQKFWDLVAKRGDQDS I PKE YAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
IHAAREAGARPIRILRAPNSTNLPLPEAGGYGEE VLENS AY 

SEQ ID NO 4017 : SAGO 653 FROM THE JM9130013 GBS TYPE VIII STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTIS IDE IQKSLEGKKPITVS FDI DDTLLFS SQYFQY 
GKEYVTPGS FDFLHKQKFWDLVAKRGDQDS I PKEYAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKTAKALAKDFKLDKPIAVNYTGDKPKKPYKYDKSYYIKKYGSDIHYGDSDDD 
I HAAREAGARP IRI LRAPNSTNLPL PEAGGYGEE VLENS AY 

SEQ ID NO 4018 : SAG0653 FROM THE M732 GBS TYPE III STRAIN 

KGPKVAYTQEGMTALSDTNKDKVTTIS I DEIQKSLEGKKPITVS FDI DDTLLFS SQYFQY 
GKEYVTPGS FDFLHKQKFWDLVAKRGDQDS I PKEYAKKLIAMHQKRGDKIVFITGRTRGS 
MYKEGEVDKT AKALAKDFKLDKP I AVN YTG DKPKKP YKY DKS Y YIKKYG S D I HYGD S D D D 
I HAAREAGARP IRI LRAPNSTNLPL PEAGGYGEE VLENS AY 

SEQ ID NO. 4101: SAG0649 FROM 2603 V/R GBS TYPE V STRAIN 
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AT GAAAAAG AG AC AAAAAAT A 

TGGAGAGGGTTATCAGTTACTTTACTAATCCTGTCCCAAATTCCATTTGGTATATTGGTA 
C AAG G T G AAAC C C AAG AT AC C AAT C AAG C ACT T GGAAAAGT AAT T GT T AAAAAAACGG G A 
G ACAATGCT ACACC AT T AGGCAAAGCGACT T TT GT GTT AAAAAATG AC AATG AT AAGT C A 
GAAACAAGTCACGAAACGGTAGAGGGTTCTGGAGAAGCAACCTTTGAAAACATAAAACCT 
G GAG AC T AC AC AT T AAG AG AAG AAAC AG C AC C AAT T G GTT AT AAAAAAACT G AT AAAAC C 
T G G AAAGT T AAAGT T G C AGAT AACG G AG C AAC AAT AAT CG AGGGT AT G GAT GCAG AT AAA 
G C AG AG AAAC G AAAAG AAGT T T T GAAT G C C C AAT AT C C AAAAT C AG CT AT T T AT GAG GAT 
ACAAAAGAAAATTACCCATTAGTTAATGTAGAGGGTTCCAAAGTTGGTGAACAATACAAA 
G CAT T GAAT C C AAT AAAT GG AAAAG AT GGT C G AAGAG AG AT T G C T GAAGGT T G GT TAT C A 
AAAAAAAT T AC AG G G GT C AAT GAT CT C GAT AAG AAT AAAT AT AAAAT T GAAT T AAC T GT T 
G AGGGT AAAACC ACT GT T G AAACG AAAGAACT T AAT C AACC ACT AG AT GTCGTTGTGCTA 
T TAG AT AAT T CAAATAGT AT GAAT AAT G AAAG AG C C AAT AAT T C T C AAAG AG CAT TAAAA 
G C T G GG G AAG C AG T T G AAAAG C T GAT T GAT AAAAT T AC AT C AAAT AAAG AC AAT AG AGT A 
GCTCTTGTGACATATGCCTCAACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGA 
GT T G C C GAT C AAAAT G GT AAAG CG C T GAAT GAT AGT GT AT CAT GG G AT TAT CAT AAAAC T 
ACTTT TACAGC AACTACAC AT AATT AC AGTT ATTT AAATTTAACAAAT GAT GCT AACGAA 
GTT AAT AT T CT AAAGT CAAGAATT CCAAAGGAAGCGGAGCAT AT AAAT GGGGAT CGCACG 
CTCTATCAATTTGGTGCGACATTTACTCAAAAAGCTCTAATGAAAGCAAATGAAATTTTA 
GAGACACAAAGTTCT AAT GCT AGAAAAAAACTT AT T T TT CACGT AACT GAT GGT GT CCCT 
ACG AT GT C T T AT GC CAT AAAT T T T AAT C C T TAT AT AT C AAC AT C T T AC C AAAAC C AG T T T 
AATT CT TT T TT AAAT AAAAT ACC AGAT AGAAGTGGTATT CT CCAAGAGGAT TTT AT AAT C 
AAT GGT GAT GAT T AT C AAAT AGT AAAAG GAG AT GG AG AG AGT T T T AAACT GT T T T C GG AT 
AGAAAAGT T C C T GT TAG T G G AGG AAC G AC AC AAG C AGC T TAT C GAG T AC C G C AAAAT C AA 
CTCTCTGTAATGAGTAATGAGGGATATGCAATTAATAGTGGATATATTTATCTCTATTGG 
AG AG AT T AC AAC T GG G T C TAT C CAT T T GAT C C T AAG AC AAAG AAAGT T T C T GC AAC G AAA 
C AAATC AAAACT CATGGT GAGC C AACAACATT AT ACT T T AATGGAAAT AT AAGACCT AAA 
GGTTATGACATTTTTACTGTTGGGATTGGTGTAAACGGAGATCCTGGTGCAACTCCTCTT 
GAAG C T G AGAAAT T TAT G C AAT C AAT AT C AAG T AAAAC AG AAAAT TAT ACT AAT G T T G AT 
GAT AC AAAT AAAAT T TAT GAT G AGC T AAAT AAAT ACT T T AAAAC AAT T GT T G AGG AAAAA 
CAT T CT AT T GT T GAT GGAAAT GT GACT GAT CCT AT GGG AG AG AT GAT T GAAT T C C AAT T A 
AAAAAT GGT C AAAGT T T T AC AC AT GAT GAT T AC GT T T T G G T T G G AAAT GAT G G C AG T C AA 
TTAAAAAATGGTGTGGCTCTTGGTGGACCAAACAGTGATGGGGGAATTTTAAAAGATGTT 
AC AGTG ACT TAT GAT AAG AC AT CT C AAAC CAT CAAAAT C AAT C AT TT GAACTT AGGAAGT 
G G AC AAAAAGT AGT T C T T AC C TAT GAT G T AC GT T T AAAAGAT AAC TAT AT AAGT AAC AAA 
TTTTACAATACAAATAATCGTACAACGCTAAGTCCGAAGAGTGAAAAAGAACCAAATACT 
ATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGGTACTAACCATC 
AG T AAT CAGAAG AAAAT G GGT G AGGT T GAAT T TAT T AAAG T T AAT AAAGAC AAAC AT T C A 
GAAT CGCT TTT GGGAGCT AAGTTT CAACT T C AGAT AGAAAAAGAT TT T T CT GGGT AT AAG 
C AAT TT GT T C C AGAG G G AAGT GAT GT T AC AAC AAAG AAT GAT GGT AAAAT T TAT T T T AAA 
G C AC T T C AAGAT G GT AAC TAT AAAT TAT AT G AAAT T T C AAGT C C AGAT G G C TAT AT AG AG 
GTT AAAACG AAAC CT GTT GT G AC ATT T AC AAT T CAAAAT G GAG AAGT T AC G AAC CT G AAA 
G C AG AT C C AAAT G C T AAT AAAAAT C AAAT C G G GT AT C T T GAAGG AAAT GG T AAAC AT CT T 
ATTACCAACACTCCCAAACGCCCACCAGGTGTTTTTCCTAAAACAGGGGGAATTGGTACA 
ATTGTCTATATATTAGTTGGTTCTACTTTTATGATACTTACCATTTGTTCTTTCCGTCGT 
AAACAATTG 

SEQ ID NO. 4102: SAG0649 FROM 090 GBS TYPE la STRAIN 

G GT G AAAC C C AAG AT AC C AAT C AAG C AC T T G GAAAAG 
T AAT TGTTAAAAAAACGGG AG AC AAT GCT AC ACC ATT AGGCAAAGCGACT 
T T T G T G T T AAAAAAT GAC AAT GAT AAGT C AG AAAC AAGT C ACG AAAC GGT 
AG AGG GT T C T GG AGAAG C AAC C T T T GAAAAC AT AAAAC CT G GAG AC T AC A 
CATTAAGAGAAGAAACAGCACCAATTGGTTATAAAAAAACTGATAAAACC 
T GG AAAGT T AAAGT T G C AG AT AACG G AG C AAC AAT AAT C G AGG GT AT GG A 
T G C AGAT AAAG C AG AG AAAC GAAAAG AAGTTTT GAAT G C C C AAT AT C C AA 
AAT C AG C TAT T TAT GAG GAT AC AAAAG AAAAT T AC C CAT T AG T T AAT GT A 
G AGG GT T C C AAAG T T GG T G AAC AAT AC AAAG CAT T GAAT C C AAT AAAT GG 
AAAAGAT GGT CG AAG AG AG AT T GCT G AAGGTT GGT T AT C AAAAAAAAT T A 
C AG GG G T C AAT GAT C T C GAT AAG AAT AAAT AT AAAAT T G AAT T AAC T GT T 
G AGG GT AAAAC C AC T GTT G AAAC G AAAG AAC T T AAT C AAC C AC TAG AT G T 
CGT T GT GCT AT T AGAT AAT T C AAAT AG TAT G AAT AAT G AAAG AGCC AAT A 
ATTCTCAAAGAGCATTAAAAGCTGGGGAAGCAGTTGAAAAGCTGATTGAT 
AAAAT T AC AT C AAAT AAAG AC AAT AG AGT AG C T C T T G T GAC AT AT G C C T C 
AACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGATC 
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AAAAT GGT AAAG CG C T G AAT G AT AGT GT AT CAT GGG AT TAT CAT AAAAC T 
AC T T T T AC AG C AAC T AC AC AT AAT T AC AGT T AT T T AAAT TT AAC AAAT GA 
TGCT AACGAAGTT AAT ATT CT AAAGT C AAGAATT CCAAAGGAAGCGGAGC 
ATATAAATGGGGATCGCACGCTCTATCAATTTGGTGCGACATTTACTCAA 
AAAG C T CT AAT G AAAG CAAAT GAAAT T T T AGAG AC AC AAAGT T C T AAT G C 
T AGAAAAAAAC T T AT T T T T C AC GT AAC T GAT GGT GT C C C T ACG AT GT C T T 
ATGCCATAAATTTTAATCCTTATATATCAACATCTTACCAAAACCAGTTT 
AAT T CT T T T T T AAAT AAAAT AC C AG AT AG AAGT GGT AT T CT C C AAG AGG A 
T T T TAT AAT C AAT GGT GAT GAT TAT CAAAT AG T AAAAG GAG AT G G AG AGA 
GT T T T AAAC T GT T T T CG GAT AG AAAAGT T C C T GT T AC T GG AG G AACG AC A 
C AAG C AG C T TAT CG AGT AC C G C AAAAT C AACT C T CT GT AAT G AG T AAT G A 
GGGATATGCAATTAATAGTGGATATATTTaTCTCTATTGGAGAGATTACA 
ACT G GGT C T AT C CAT T T GAT C C T AAG AC AAAG AAAG T T T C T GC AAC G AAA 
CAAAT CAAAACT CAT GGTG AGCCAAC AACATT AT ACTTT AAT GG AAAT AT 
AAGACCTAAAGGTTATGACATTTTTACTGTTGGGATTGGTGTAAACGGAG 
ATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAATCAATATCA 
AGT AAAAC AG AAAAT TAT ACT AAT GT T GAT GAT AC AAAT AAAAT TT AT G A 
T G AG CT AAAT AAAT AC T T T AAAAC AAT TG T T G AGGAAAAAC AT T CT AT T G 
T T GAT G GAAAT GT G ACT GAT CC T AT GGG AG AG AT GAT T G AAT T C C AAT T A 
AAAAAT GGT C AAAGT T T T AC AC AT GAT GAT T AC GtTTTGGtT GG AAAT G A 
tGGCAGTCAATTAAAAAATGGTGTGGCTCTTGGTGGACCAAACAGTGATG 
GGGGAAT T T T AAAAGAT G T T AC AGT G AC T TAT GAT AAG AC AT C T C AAAC C 
AT C AAAAT C AAT CAT T T G AACT T AGGAAGT GG AC AAAAAGT AGT T C T T AC 
CTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAATTTTACAATA 
CAAAT AAT CGT AC AACG CT AAGT CCGAAGAGTGAAAAAGAACC AAAT ACT 
ATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGGT 
AC T AAC CAT C AGT AAT C AGAAGAAAAT GG GT GAG G T T G AAT T T AT T AAAG 
TTAATAAAGACAAACATTCAGAATCGCTTTTGGGAGCTAAGTTTCAACTT 
C AGAT AG AAAAAG AT T T T T CT GGGT AT AAG C AAT T T GT T CC AG AG G G AAG 
T GAT GT T AC AAC AAAG AAT GAT G GT AAAAT T TAT T T T AAAGC AC T T C AAG 
AT GGT AACT AT AAAT TAT AT GAAAT T T C AAGT CC AG AT GGCT AT AT AGAG 
GT T AAAAC G AAAC C T G T T G T GAC AT T T AC AAT T C AAAAT GG AG AAGT T AC 
G AAC CT G AAAG C AGAT C CAAAT G C T AAT AAAAAT CAAAT C GG GT AT C T T G 
AAGG AAAT G GT AAAC AT CTT AT T AC C AAC ACT CC C AAAC GC C C AC C AGGT 
GTT 

SEQ ID NO. 4103: SAG0649 FROM A909 GBS TYPE la STRAIN 

GGT GAAAC C C AAG AT AC C AAT C AAG C ACT T G G AAAA 

GT AAT T GT T AAAAAAACGGG G GAC AAT G CT AC AC CAT TAG G C AAAG C GAC 
TTTTGTGT T AAAAAAT GAC AAT GAT AAGT CAg AAAC AAGT C AC G AAAC GG 
TAGAGGGTTCTGGAGAAgCAACCTTTGAAAACATAAAACCTGGAGACTAC 
ACATTAAGAGAAGAAACAGCACCAATTGGTTATAAAAAAACTGATAAAAC 
C T G G AAAG T T AAAGT T GC AGAT AAC GG AG C AAC AAT AAT CG AG GGT AT GG 
AT GC AGAT AAAG C AG AG AAAC GAAAAGAAG T T T T G AAT G C C C AAT AT C C A 
AAAT C AGCT AT T T AT G AGG AT AC AAAAG AAAATT ACC CAT T Ag T T AAT G T 
AG AG GGT T C C AAAGT T GGT G AAC AAT AC AAAG CAT T GAAT C C AAT AAAT G 
GAAAAGATGGT CGAAGAG AGAT T GCTGAAGGTTGGTT AT C AAAAAAAAT T 
ACAGGGGT C AAT GATCT CGAT AAGAAT AAAT AT AAAATTG AATT AACTGT 
T G AGG GT AAAAC C AC T GT T G AAACG AAAGAAC T T AAT C AAC C AC T AGAT G 
TCGTTGTGC TAT TAG AT AAT T CAAAT AG TAT GAAT AAT G AAAG AG C C AAT 
AAT T C T C AAAG AG CAT T AAAAG C T GG GG AAG C AGT T G AAAAG C T GAT T G A 
T AAAATT AC AT C AAAT AAAGAC AAT AG AGT AGCT CTTGTGAC AT ATGCCT 
CAACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGAT 
C AAAAT GGT AAAG C G C T GAAT GAT AGT G TAT CAT GGG AT TAT CAT AAAAC 
TACTTTTACAGCAACTACACATAATTACAGTTATTTAAATTTAACAAATG 
AT G CT AAC G AAGT T AAT AT T C T AAAGT C AAG AAT T C C AAAG G AAGCGG AG 
CAT AT AAAT G G GG AT C G C ACG C T C T AT C AAT T T G GT G CGAC AT T T AC T C A 
AAAAG C T CT AAT G AAAG CAAAT GAAAT T T TAG AG AC AC AAAGT T C T AAT G 
CTAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCT 
TAT G C CAT AAAT T T T AAT C CT TAT AT AT C AAC AT C T T AC C AAAAC CAG T T 
T AAT T C T T T T T T AAAT AAAAT AC CAG AT AG AAGT GG T AT TC T C C AAG AG G 
ATT T T AT AAT C AAT GGT GAT GAT TAT CAAAT AGT AAAAG GAG AT GG AG AG 
AGT T T T AAAC T G T T T T CGG AT AG AAAAG T T C CT GT TACT GG AG G AAC GAC 
ACAAGCAGCTTATCGAGTACCGCAAAATCAACTCTCTGTAATGAGTAATG 
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AGGGATATGCAATTAATAGTGGATATATTTATCTCTATTGGAGAGATTAC 
AACTGGGTCTATCCATTTGATCCTAAGACAAAGAAAGTTTCTGCAACGAA 
AC AAAT C AAAAC T C AT GGT G AG C C AAC AAC AT TAT AC T T T AAT G G AAAT A 
T AAGAC CT AAAGG T TAT G AC AT T T T TACT GT T G G GAT T GGT GT AAACGG A 
GAT C C T GGT GC AAC T C CT CT T GAAGCT GAG AAAT T TAT G C AAT C AAT AT C 
AAGT AAAAC AGAAAAT TAT ACT AAT GT T GAT G AT AC AAAT AAAATT T AT G 
ATGAG CT AAAT AAAT AC T T T AAAAC AAT T GT T GAG G AAAAAC AT T CT AT T 
GT T G AT GGAAAT GT G ACT GAT C C T AT GGGAG AG AT GAT T G AAT T C C AAT T 
AAAAAAT GGT C AAAG T T T T AC AC AT GAT GAT T AC G tTTTGGtT GGAAAT G 
At GG C AGT C AAT T AAAAAAT GGTGTGGCTCTTGGTG G AC C AAAC AGT GAT 
GGGG G AAT T T T AAAAGAT G T T AC AGT G ACTT AT GAT AAGAC AT CT C AAAC 
CAT C AAAAT C AAT CAT T T G AACT T AGGAAGT G GAC AAAAAGT AGT T C T T A 
CC T AT G AT GT AC GT T T AAAAGAT AAC TAT AT AAGT AAC AAAT T T T AC AAT 
AC AAAT AAT C GT AC AAC GC T AAGT C CG AAGAGT G AAAAAG AAC C AAAT AC 
TATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGG 
T ACTAAC CAT CAGTAAT CAGAAG AAAAT G GGT GAGGTT GAAT TTATT AAA 
GT T AAT AAAGAC AAAC AT T C AG AAT C GC T T T T G GG AG C T AAGT T T C AAC T 
TCAGATAGAAAAAGATTTTTCTGGGTATAAGCAATTTGTTCCAGAGGGAA 
GT GAT GT T AC AAC AAAG AAT GAT G GT AAAAT T TAT T TT AAAGC AC T T C AA 
GAT G GT AACT AT AAAT TAT AT G AAAT T T CAAGT C CAGAT GG CT AT AT AG A 
GGTTAAAACGAAACCTGTTGTGACATTTACAATTCAAAATGGAGAAGTTA 
C G AAC C T GAAAG CAGAT C C AAAT GC T AAT AAAAAT C AAAT C G GG T AT C T T 
G AAG GAAAT GGT AAAC AT CT T AT T AC C AAC AC T C C C AAACGC CC AC C AG G 
TGTT 

SEQ ID NO. 4104: SAG0649 FROM 18RS21 GBS TYPE II STRAIN 

GGT G AAAC C C AAGAT AC C AAT C AAG C AC 

TTGGAAAAGTAATTGTTAAAAAAACGGGAGACAaTGCTACACCaTTAGGC 
AAAG C GAC TTTTGTGT T AAAAAAT GAC AAT GAT AAG T C AG AAAC AAG T C A 
C GAAACG G T AG AGG G T T C T GGAG AAg C AAC CT T T GAAAAC AT AAAAC C T G 
GAGACTACAC ATT AAGAGAAGAAACAGCAC CAAT T GGTTAT AAAAAAACT 
GATAAAACCTGGAAAGTTAAAGTTGCAGATAACGGAGCAACAATAATCGA 
GG GT AT GGAT G CAGAT AAAG C AG AGAAAC G AAa AG AAG T T T T GAAT G C C C 
AAT AT C C AAAAT CAG C T AT T TAT GAG GAT AC AAAAG AAAAT T AC C CAT T A 
GT T AAT GT AG AGGG T T C C AAAGT T GGT GAAC AAT AC AAAGC AT T GAAT C C 
AAT AAAT G G AAAAG AT GGT C G AAGAG AG AT T G C T GAAG GT T GGT TAT C AA 
AAAAAAT TaCaGGGGT CAAT G AT CT C GAT AAG AAT AAAT AT AAAAT T G AA 
TTAACTGTTGAGGGTAAAACCACTGTTGAAACGAAAGAACTTAATCAACC 
AC T AGAT GTCGTTGTGC TAT TAG AT AAT T C AAAT AGT AT GAAT AAT G AAA 
GAGC CAAT AAT T CT C AAAG AG CAT T AAAAG C T G GG GAAG C AGT T GAAAAG 
C T GAT T GAT AAAAT T AC AT C AAAT AAAGAC AAT AG AGT AG CT C T T GT G AC 
AT AT G C CT C AAC CAT T T T T GAT G GT AC T G AAGC G AC C GT AT C AAAGGGAG 
T T G C C GAT C AAAAT G GT AAAG C G CT GAAT GAT AGT GT AT CAT GG GAT TAT 
CAT AAAAC T ACTTT T AC AG C AACT ACAC AT AAT T AC AG T TAT T T AAAT T T 
AACAAATGATGCTAACGAAGTTAATATTCTAAAGTCAAGAATTCCAAAGG 
AAG C G GAG CAT AT AAAT GG GG AT CG C AC G C T C T AT CAAT T T GGT G CG AC A 
T T T ACT C AAAAAG C T C T AAT GAAAG C AAAT GAAAT T T TAG AG AC AC AAAG 
TTCTAATGCTAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTA 
CGATGTCTTATGCCATAAATTTTAATCCTTATATATCAACATCTTACCAA 
AAC C AGT T T AAT T CT T T T T T AAAT AAAAT AC CAG AT AG AAGT GGT AT T C T 
CCAAGAGGATTTT AT AATC AAT GGT GATGATT AT CAAAT AGT AAAAGGAG 
AT GGAG AG AG T T T T AAAC T GT T T T C GG AT AG AAAAGT T CCTGTTACTG G A 
GGAAC G AC AC AAG C AGC T TAT C GAG T AC C G C AAAAT C AAC T C T C T GT AAT 
GAGTAATGAGGGATATGCAATTAATAGTGGATATATTTATCTCTATTGGA 
GAGATTACAACTGGGTCTATCCATTTGATCCTAAGACAAAGAAAGTTTCT 
G C AAC G AAAC AAAT C AAAAC T CAT G GT G AG C C AAC AAC AT TAT AC T T T AA 
T G GAAAT AT AAG AC CT AAAGG TT AT GAC AT T T T T AC T GT T G GGAT T G GT G 
TAAACGGAGATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAA 
T CAATAT CAAGT AAAAC AGAAAAT TAT ACT AAT GT TGATGAT ACAAAT AA 
AATT TATGAT GAGCT AAAT AAAT ACT TTAAAAC AAT TGTT GAGGAAAAAC 
AT T C TAT TGTT GAT G GAAAT GT GAC T GAT C C T AT G G GAG AG AT GAT T GAA 
T T C CAAT T AAAAAAT GGT C AAAG T T T T AC AC AT GAT G AT T AC GT T T T G GT 
TGGAAATGATGGCAGTCAATTAAAAAATGGTGTGGCTCTTGGTGGACCAA 
AC AGTG AT GGG G G AAT T T T AAAAG AT GT T AC AG T GAC T T AT GAT AAG AC A 
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T CT C AAAC CAT C AAAAT CAAT CAT T T GAACT T AG G AAGT G GAC AAAAAGT 
AGTTCTTACCTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAAT 
T T T ACAAT AC AAAT AAT C GT AC AAC G CT AAGT C C G AAGAGT G AAAAAGAA 
CCAAATACTATtcGtgATTtCCCAATTCCCAAAATTCGTGATGTTCGTGA 
GT T T C C G GT AC T AAC CAT C AGT AAT CAGAAG AAAAT GGGT GAG GT T GAAT 
TTATTAAAGTTAATAAAGACAAACATTCAGAATCGCTTTTGGGAGCTAAG 
TTTCAACTTCAGATAGAAAAAGATTTTTCTGGGTATAAGCAATTTGTTCC 
AGAG G GAAGT GAT GT T AC AAC AAAG AAT G AT GGT AAAAT T TAT T T T AAAG 
C ACT T C AAGAT G G T AACT AT AAAT TAT AT GAAAT T T C AAGT C C AG AT G G C 
T AT AT AG AG GT T AAAAC GAAAC CT GT T GT GAC AT T TAG AAT T C AAAAT G G 
AG AAGT T ACG AAC C T GAAAG C AG AT C C AAAT G CT AAT AAAAAT C AAAT C G 
GGT AT C T T GAAG GAAAT G GT AAAC AT C T TAT T AC C AAC AC T C C C AAAC G C 
C C AC C AG GT GT T 

SEQ ID NO. 4105: SAG0649 FROM M732 GBS TYPE III STRAIN 

GGT GAAAC C C AAGAT AC C AAT C AAGC ACT 

TGGAAAAGTAATTGTTAAAAAAACGGGAGACAaTGCTACACCATTAGGCA 
AAG C GAC TTTTGTGT T AAAAAAT GAC AAT GAT AAGT C AG AAAC AAG T C AC 
G AAACGGT AGAGGGT T C T G GAG AAGC AACCT T T GAAAAC AT AAAAC CT GG 
AG AC T AC AC AT T AAGAG AAGAAAC AG C AC CAAT T GGT T AT AAAAAAACT G 
AT AAAAC C T G GAAAG T T AAAGT T G C AGAT AAC G GAGC AAC AAT AAT C GAG 
GGT AT GG AT G C AG AT AAAG C AGAG AAACG AAAAGAAG T T T T GAAT GC C C A 
ATATCCAAAATCAGCTATTTATGAGGATACAAAAGAAAATTACCCATTAg 
T T AAT GT AGAGG GT T C C AAAGT T GGT G AAC AAT AC AAAG CAT T GAAT CCA 
AT AAAT GG AAAAG AT G GT C GAAG AGA GAT T GC T GAAG GT T GGT T AT C AAA 
AAAAAa T aCaGGGGT CAAT GAT C T C GAT AAG AAT AAAT AT AAAAT T GAAT 
T AAC T GT T GAG G G T AAAAC C AC T GT T GAAAC GAAAG AAC T T AAT C AAC C A 
CT AGAT GTCGTTGT G CT AT TAG AT AAT T C AAAT AG TAT GAAT AAT GAAAG 
AG C CAAT AAT T C T C AAAG AGC AT T AAAa G C T GG GG AAG C AG T T G AAAAG C 
TGATT GAT AAAATT ACAT C AAAT AAAGAC AAT AG AG T AG CT C T T GT GACA 
TAT G C C T C AAC CAT T T T T GAT GGT AC T G AAG CG AC C GT AT C AAAG G G AGT 
TGCCGATCAAAATGGTAAAGCGCTGAATGATAGTGTATCATGGGATTATC 
AT AAAAC T AC T T T T AC AG CAACT AC AC AT AAT T AC AGT T AT T T AAAT T T A 
AC AAAT GAT G C T AAC GAAGT T AAT AT T C T AAAGT C AAGAAT T C C AAAG G A 
AGCGGAGCATATAAATGGGGATCGCACGCTCTATCAATTTGGTGCGACAT 
T T AC T C AAAAAGC T C T AAT G AAAGC AAAT GAAAT T T T AG AG AC ACAAAG T 
TCTAATGCTAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTAC 
GAT GT C T TAT G C CAT AAAT T T T AAT C C TT AT AT AT C AAC AT CT T AC C AAA 
AC C AGT T T AAT T C T T T T T T AAAT AAAAT AC C AG AT AG AAG T GGT AT T CT C 
C AAGAG GAT T T TAT AAT CAAT GGT GAT GAT TAT C AAAT AG T AAAAGG AGA 
T GG AGAG AGT T T T AAAC T GT T T T CG G AT AGAAAAGT T C CT GT T AC T G GAG 
GAAC G AC AC AAG C AG C T TAT CG AGT AC C G C AAAAT C AAC T C T C T GT AAT G 
AG T AAT G AGGG AT AT G C AAT T AAT AGT GG AT AT AT T T AT C T C T AT T GG AG 
AG AT TAG AAC T GGGTCT AT C CAT T T GAT C CT AAG AC AAAG AAAGT T T CT G 
C AAC GAAAC AAAT C AAAACT C AT G GT G AG C CAAC AAC AT TAT AC T T T AAT 
GG AAAT AT AAG AC C T AAAG G T TAT GAC AT T T T TACT GT T G G GAT T GGT G T 
AAACGGAGATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAAT 
CAAT AT CAAGTAAAACAGAAAATT AT ACTAATGTT GAT GAT ACAAAT AAA 
AT T T AT GAT GAG C T AAAT AAAT AC T T T AAAAC AAT T GT T G AGG AAAAAC A 
T T C T AT T G T T G AT GG AAAT GT G AC T GAT C C TAT G GG AG AG AT GAT T GAAT 
T C CAAT T AAAAAAT GGT C AAAG T T T T AC AC AT GAT G AT T AC G tTTTGGtT 
GGAAATGAtGGCAGT CAAT TAAAAAATGGTGTGGCTCTT GGT GGACCAAA 
C AGT G AT GGGG G AAT T T T AAAAG AT GT T AC AGT GAC T T AT GAT AAG AC AT 
CT C AAAC CAT C AAAAT CAAT CAT T T G AAC T TAG GAAGT G GAC AAAAAGT A 
GT T CT T AC CT AT GAT G T AC GT T T AAAAG AT AACT AT AT AAGT AAC AAAT T 
T TAG AAT AC AAAT AAT CGT AC Aa C G C T AAGT C CG AAG AGT GAAAAAG AAC 
CAAATACTATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAG 
TTTCCGGTACTAACCATCAGTAATCAGAAGAAAATGGGTGAGGTTGAATT 
TAT T AAAG T T AAT AAAG AC AAAC AT T C AG AAT CGCTTTTGG GAG C T AAGT 
TTCAACTTCAGATAGAAAAAGATTTTTCTGGGTATAAGCAATTTGTTCCA 
G AGG GAAGT GAT GT T AC AAC AAAG AAT GAT GGT AAAAT T TAT T T T AAAG C 
AC T T C AAGAT G G T AACT AT AAAT TAT AT GAAAT T T C AAGT C C AG AT GG C T 
ATATAGAGGTTAAAACGAAACCTGTTGTGACATTTACAATTCAAAATGGA 
GAAGT TAG GAAC CT G AAAGC AG AT C C AAAT G C T AAT AAAAAT C AAAT C G G 
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GTATCTTGAAGGAAATGGTAAACATCTTATTACCAACACTCCCAAACGCC 
CACCAGGTGTT 

SEQ ID NO. 4106: SAG0649 FROM COH1 GBS TYPE III STRAIN 

GGT GAAACCCAAGAT AC CAAT CAAGCACTTGGAAAAG 
T AAT T GT T AAAAAAACG G GAG AC Aa T G C T AC AC C AT T AGG C AAAGC GAC T 
TTTGT GTTAAAAAATGACAATGAT AAGT CAGAAACAAGT CACGAAACGGT 
AGAGG GT T CT GG A r AAG C AACC T TT GAAAAC AT AAAAC C T G GAG ACT AC A 
C AT T AAG AG AAG AAAC AG C AC CAAT T GGT T AT AAAAAAAC T GAT AAAAC C 
T GGAAAGT TAAAGT T GC AG AT AACGGAGCAAC AAT AAT C GAG GGT AT GGA 
T G C AGAT AAAG C AG AGAAACG AAAAG AAGT T T T GAAT G C C CAAT AT C C AA 
AATCAGCTATTTATGAGGATACAAAAGAAAATTACCCATTAgTTAATGTA 
GAGGGTT C CAAAGTT GGTGAAC AAT a CAAAGCAT TGAAT C CAATAAATGG 
AAAAG AT GGT CGAAGAGAGATT GCT GAAGGTTGGT T AT C AAAAAAAAAT A 
CAGGGGTCAATGATCTCgATAAGAATAAATATAAAATTGAATTAACTGTT 
GAGGGTAAAACCACTGTTGAAACGAAAGAACTTAATCAACCACTAGATGT 
C GT T G T G CT AT T AGAT AAT T C AAAT AGT AT GAAT AAT G AAAG AG C CAAT A 
ATT CT CAAAGAGCATT AAAAGCTGGGG AAGCAGT TGAAAAGCTGAT T GAT 
AAAAT T ACAT C AAAT AAAG AC AAT AG AG T AG CT C T T GT G AC AT AT G C CT C 
AACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGATC 
AAAAT GGT AAAG C G C T GAAT GAT AG T GT AT CAT GG G AT TAT CAT AAAAC T 
ACTTTTACAGCAACTACACATAATTACAGTTATTTAAATTTAACAAATGA 
T GCT AACGAAGTTAAT ATT CT AAAGT CAAGAATT CCAAAGGAAGCGGAGC 
ATATAAATGGGGATCGCACGCTCTATCAATTTGGTGCGACATTTACTCAA 
AAAGCT CT AAT GAAAG C AAAT G AAAT T T TAG AG AC AC AAAGT T C T AAT G C 
TAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCTT 
AT GCCAT AAAT TTT AAT CCTT AT ATAT CAACAT CTT ACC AAAAC CAGT TT 
AATT CTTT T TTAAAT AAAAT AC CAGATAGAAGTGGT ATT CTC CAAGAGGA 
TTT TAT AAT CAAT GGT GAT GAT TAT CAAAT AGT AAAAGG AGAT GGAGAGA 
GTTTTAAACTGTTTTCGGATAGAAAAGTTCCTGTTACTGGAGGAACGACA 
CAAGCAGCT TAT CGAGTAC CGC AAAAT CAACT CT CT GTAATGAGTAATGA 
GGGATATGCAATTAATAGTGGATATATTTATCTCTATTGGAGAGATTACA 
AC T GGGT C TAT C CAT T T GAT C C T AAG AC AAAG AAAGT T T CT G C AAC G AAA 
CAAAT CAAAACT CAT GGTGAGC CAACAACAT TAT ACTTT AAT GGAAAT AT 
AAGACCTAAAGGTTATGACATTTTTACTGTTGGGATTGGTGTAAACGGAG 
AT C C T GGT G C AAC T C CT C TT GAAGC T GAG AAAT T TAT GC AAT CAAT AT C A 
AGT AAAAC AG AAAAT TAT ACT AAT G T T GAT GAT AC AAAT AAAAT T TAT GA 
T GAG C T AAAT AAAT ACT T T AAAAC AAT T GT T GAG G AAAAAC AT T C TAT T G 
T T GAT GGAAAT GT GAC T GAT C CT AT GGG AGAG AT GAT T GAAT T C CAAT T A 
AAAAATGGTCAAAGTTTTACACATGATGATTACGTTTTGGTT GGAAAT GA 
TGGCAGTCAATTAAAAAATGGTGTGGCTCTTGGTGGACCAAACAGTGATG 
GGGGAAT TTT AAAAGAT GT T ACAGT GACT T ATGAT AAGACAT CT C AAACC 
AT C AAAAT CAAT CAT T T G AAC T T AGGAAGT GG AC AAAAAGT AGT T C T T AC 
CTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAATTTTACAATA 
CAAAT AAT C GT AC AAC G CT AAGT C CG AAGAGT G AAAAAG AAC CAAAT AC T 
ATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGGT 
ACTAACCATC AGT AAT C AGAAGAAAAT GGGT GAGGTT GAAT T TAT T AAAG 
T T AAT AAAG AC AAAC AT T C AgAAT CG C T T T T G GG AG C T AAGT T T C AACT T 
C AG AT AG AAAAAG AT T T T T CT GG GT AT AAGC AAT T T GT T C C AG AGG GAAG 
T GAT GTTACAACAAAGAAT GATGGT AAAAT TT AT T TTAAAGCACTT CAAG 
AT G G T AAC TAT AAAT TAT AT GAAAT T T C AAG T C C Ag AT G G C T AT AT AG AG • 
GT T AAAAC G AAAC C T GT T GT G AC AT T T AC AAT T C AAAAT GGAGAAGT T AC 
GAAC CTGAAAGC AGAT C CAAATGCT AAT AAAAAT CAAAT CGGGT AT CT T G 
AAGG AAAT GGT AAAC AT CT T AT T AC C AAC ACT C C C AAAC G C C C AC C AG G T 
GTT 

SEQ ID NO. 4107: SA60649 FROM M781 GBS TYPE III STRAIN 

T T G G AAAAGT AAT T GT T AAAAAAAC GGG AG AC ACT G CT AC AC C AT T AGG C • 
AAAGC GAC T T T TGT GT T AAAAAATG AC AAT GAT AAGT CAGAAACAAGT C A 
CGAAACGGTAGAGGGTTCTGGAAAAGC AAC CTTT GAAAAC AT AAAAC CTG 
G AG AC T AC AC ATT AAG AG AAGAAACAGC ACC AATT GGT TAT AAAAAAAC T 
GAT AAAAC CTG G AAAGT TAAAGT T G C AG AT AAC GG AG C AmC AAT AAT C G A 
GG G T AT G GAT G C AG AT AAAG C AG AG AAAC G AAAAG AAG T T T T GAAT G C C C 
AAT AT C C AAAAT C AGCT AT T T AT GAG GAT AC AAAAG AAAAT T AC CC ATT A 
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g T T AAT GT AG AGGGT T C C AAAG T T GGT G AAC AAT AC AAAGC AT T G AAT C C 
AAT AAAT GG AAAAG AT GG T C gAAG AG AG AT T G CT G AAGGT T G GT TAT C AA 
AAAAAATTACaGGGGTCAATGATCTCGATAAGAATAAATATAAAATTGAA 
T T AACT GT T GAG G GT AAAAC C ACT GT T GAAAC g AAAG AACT T AAT C AAC C 
ACTAGATGTCGTTGTGCTATTAGATAATTCAAATAGTATGAATAATGAAA 
GAG C C AAT AAT T C T C AAAG AGC AT T AAAAG C T GGGG AAGC AG T T GAAAAG 
C T GAT T G AT AAAAT T AC AT C AAAT AAAG AC AAT AGAG TAGC T C T TGT GAC 
AT AT G C C T C AAC CAT T T T T GAT GGT AC T GAAG C G AC CGT AT C AAAGG G AG 
T T GC C GAT C AAAAT GGT AAAGC GCT G AAT G AT AGT GT AT CAT GG GAT TAT 
C AT AAAACT ACT T T T AC AG C AACT AC AC AT AAT T AC AGT T AT T T AAAT T T 
AAC AAAT GAT G C T AACGAAG T T AAT AT T C T AAAGT C AAG AAT T C C AAAGG 
AAG C GG AGC AT AT AAAT G G G G AT CGC AC G C T C TAT C AAT T T GGT G CGAC A 
T T T AC T C AAAAAG CT C T AAT G AAAGC AAAT GAAAT T T T AG AG ACAC AAAG 
TTCTAATGCTAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTA 
C GAT GT C T T AT G C CAT AAAT T T T AAT C C T TAT AT AT C AAC AT C T T AC C AA 
AAC CAGTT T AAT T CTT TT T T AAAT AAAAT ACCAGATAGAAGTGGT ATT CT 
C C AAG AG GAT T T TAT AAT C AAT GGT GAT GAT TAT C AAAT AGT AAAAGG AG 
AT G G AG AGAGT T T T AAACT GT T T T C GGAT AG AAAAGT T C C T G T TAG T G GA 
G GAAC GAC AC AAGC AG CT TAT CGAGT AC C G C AAAAT C AAC T CT C T G T AAT 
GAGT AAT GAGGGAT AT GCAATT AAT AGT GGATATATTT AT CT CT At TGGA 
GAG AT T AC AACT GGG T C T AT C CAT T T GAT C C T AAG AC AAAG AAAGT T T CT 
G C AAC GAAAC AAAT C AAAAC T C AT GG T GAG C C AAC AAC AT TAT AC T T T AA 
TGGAAATATAAGACCTAAAGGTTATGACATTTTTACTGTTGGGATTGGTG 
TAAACGGAGATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAA 
T CAAT AT CAAGT AAAACAGAAAAT T AT ACT AAT GTTGAT G AT ACAAAT AA 
AAT T TAT GAT G AGC T AAAT AAAT AC T T T AAAAC AAT T GT T G AGGAAAAAC 
AT T C T AT T G T T GAT GG AAAT GT GAC T GAT C C T AT G G GAG AG AT GAT T GAA 
TTCCAATTAAAAAATGGTCAAAGTTTTACACATGATGATTACGTTTTGGT 
TGGAAATGATGGCAGTCAATTAAAAAATGGTGTGGCTCTTGGTGGACCAA 
AC AG T GAT G GG GG AAT T T T AAAAG AT GT T AC AGT GAC T T AT GAT AAG AC A 
T CT CAAAC CAT C AAAAT CAAT CAT TT GAAC T TAG GAAGT G GAC AAAAAGT 
AGTTCTTACCTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAAT 
T T T AC AAT AC AAAT AAT C G T AC AACG C T AAGT C C GAAG AGT GAAAAAGAA 
CCAAATACTATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGA 
G T T T C CGG T AC T AAC CAT C AGT AAT C AG AAG AAAAT GGGT GAGGT T G AAT 
T TAT T AAAGT T AAT AAAG AC AAAC AT T C AG AAT C G C T T T T GG G AG C T AAG 
TTTCAACTTCAGATAGAAAAAGATTTTTCTGGGTATAAGCAATTTGTTCC 
AGAGGG AAGT GAT GT T AC AAC AAAG AAT GAT G GT AAAAT T T AT T T T AAAG 
C AC T T C AAG AT GGT AAC TAT AAAT TAT AT GAAAT T T C AAG T C C AG AT GG C 
TAT AT AG AG GT T AAAAC GAAAC CT GT T G T GAC AT T T AC AAT T C AAAAT GG 
AG AAGT T ACG AAC CT GAAAGC AG AT C C AAAT GC T AAT AAAAAT C AAAT C G 
G GT AT CTT GAAG GAAAT GGT AAAC AT C T TAT T AC C AAC ACT C C C AAAC G C 
CCACCAGGTGTT 

SEQ ID NO. 4108: SAGO 64 9 FROM CJB GBS NONT Y PE ABLE STRAIN 

GGT GAAAC C C AAGAT AC CAAT C AAG C AC T T G GAAAAGT 
AAT TGT T AAAAAAAC G G GAGAC Aa T G C T AC AC CAT T AGGC AAAG C GAC T T 
T T G T GT T AAAAAAT GAC AAT GAT AAGT C AG AAAC AAGT C AC GAAAC G GT A 
GAG G GT T CT GGAr AAGC AAC C T T T G AAAAC AT AAAAC C T G G AGACT AC AC 
AT T AAGAG AAGAAAC AG C AC CAAT T G GT TAT AAAAAAACT GAT AAAAC C T 
G GAAAG T T AAAGT T GC AG AT AAC G GAG C AAC AAT AAT C G AGG G T AT GGAT 
G CAGAT AAAG C AGAG AAAC GAAAAG AAG T T T T G AAT G C C CAAT AT C C AAA 
AT C AG C TAT T TAT GAG GAT AC AAAAG AAAAT T AC C CAT T Ag T T AAT GT AG 
AGGGT T C C AAAG T T GG T GAAC AAT AC AAAG CAT T G AAT C CAAT AAAT GG A 
AAAG AT G GT C G AAGAG AG AT T G C T GAAG G T T G G T TAT C AAAAAAAAT T AC 
a G G G GT CAAT GAT C T C GAT AAG AAT AAAT AT AAAAT T G AAT T AAC T GT T G 
AGG G T AAAAC C ACT GT T GAAAC GAAAG AAC T T AAT C AAC C AC TAG AT G T C 
GTTGTGCTATTAgATAATTCAAATAGTATGAATAATGAAAGAGCCAATAA 
T T C T C AAAG AG CAT T AAAAG C T G G G GAAG C AG T T GAAAAG C T GAT T GAT A 
AAAT T AC AT C AAAT AAAGAC AAT AG AGT AG C T C T T GT GAC AT AT G C CT C A 
ACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGATCA 
AAAT G GT AAAG C G C T GAAT GAT AGT G T AT CAT G GGAT TAT CAT AAAAC T A 
CTT T TAG AG C AACT AC AC AT AAT T AC AGT TAT T T AAAT T T AAC AAAT GAT 
GCTAACGAAGTTAATATTCTAAAGTCAAGAATTCCAAAGGAAGCGGAGCA 
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T AT AAAT G G GG AT C G CAC G CT CT AT C AAT T T GGT G C GAC AT T T AC T C AAA 
AAG CT CT AAT GAAAG C AAAT G AAAT T T T AGAGAC AC AAAGT T CT AAT G C T 
AGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCTTA 
T G C CAT AAAT T T T AAT C C T TAT AT AT C AAC AT C T T AC C AAAAC C AGT T T A 
AT T C T T T T T T AAAT AAAAT AC C AG AT AG AAGT G GT AT T C T C C AAGAGGAT 
T T TAT AAT C AAT GGT GAT GAT TAT C AAAT AGT AAAAG GAG AT GG AGAG AG 
T T T T AAAC T GT T T T C GG AT AG AAAAGT T C C T GT T ACT GG AGGAAC GAC AC 
AAG C AG C T TAT C GAG T AC C G C AAAAT C AACT C T CT GT AAT G AGT AAT GAG 
GG AT AT G C AAT T AAT AGT GG AT AT AT TT AT C T CT ATT G G AG AGAT T AC AA 
CTGGGTCTATCCATTTGATCCTAAGACAAAGAAAGTTTCTGCAACGAAAC 
AAAT C AAAAC T CAT G GT G AG C C AAC AAC AT TAT ACT T T AAT GG AAAT AT A 
AG AC C T AAAGGT TAT GAC AT T T T T AC T GT T GGG AT T GG T GT AAAC GG AG A 
T C C T G GT GC AAC T C CT C T T GAAGCT GAG AAAT T TAT G C AAT C AAT AT C AA 
G T AAAAC AG AAAAT TAT AC T AAT GT T GAT GAT AC AAAT AAAAT T TAT GAT 
GAG C T AAAT AAAT ACT T T AAAAC AAT T GT T GAG G AAAAAC AT T C TAT TGT 
TGAT GGAAAT GTG ACT GAT CCT ATGGG AGAGAT GAT T GAAT T CCAATTAA 
AAAATGGTCAAAGTTTTACACATGATGATTACGTTTTGGTTGGAAATGAt 
GGC AGT C AAT TAAAAAAT GGT GTGGCTCTTGGTGGACC AAAC AGT GATGG 
GGGAATTTTAAAAGATGTTACAGTGACTTATGATAAGACATCTCAAACCA 
TCAAAATCAATCATTTGAACTTAGGAAGTGGACAAAAAGTAGTTCTTACC 
TAT GAT GT AC GT T T AAAAG AT AAC TAT AT AAG T AAC AAAT TT T AC AAT AC 
AAAT AAT C GT AC AAC G CT AAG T C C G AAGAGT G AAAAAGAAC C AAAT ACT A 
TTCGTGATTTCCCAATtCCCAAAATTCGTGATGTTCGTGAGTTTCCGGTA 
C T AAC CAT C AGT AAT CAGAAG AAAAT G GGT G AG GT T GAAT T T AT T AAAGT 
T AAT AAAG AC AAAC AT T C AG AAT C GC T T T T GGG AG C T AAGT T T C AAC T T C 
AG AT AGAAAAAG AT T T T T CT GGGT AT AAG C AAT T T G T T C C AG AG GG AAGT 
GATGTTACAACAAAGAATGATGGTAAAATTTATTTTAAAGCACTTCAAGA 
T G G T AAC TAT AAAT TAT AT G AAAT T T C AAG T C C AGAT GG C TAT AT AG AG G 
T T AAAACG AAAC C T GT T GT G AC AT T T AC AAT T C Aa AAT GG AG AAG T T AC G 
AAC C T G AAAG C AG AT C C AAAT G C T AAT AAAAAT C AAAT C G G GT AT C T T G A 
AGGAAAT GGT AAAC AT C T TAT T AC C AAC AC T C C C AAAC G C C CAC C AG GT G 
TT 

SEQ ID NO. 4109: SAG0649 PROM JM9130013 GBS TYPE VIII STRAIN 

GGT G AAAC C C AAGAT AC C AAT CAAG C AC T T G G AAAAG 
T AAT TGT T AAAAAAAC GGG AG AC AAT GC T AC AC C AT TAG G C AAAG C GAC T 
T T T GT GT TAAAAAAT GAC AAT GAT AAGT C AGAAAC AAG T CAC G AAAC GGT 
AG AGG GT T C T G GAG AAG C AAC C T T T GAAAAC AT AAAAC C T G GAG AC T AC A 
CAT T AAGAGAAGAAACAG C ACC AAT TGGT TATAAAAAAACT GAT AAAACC 
TGGAAAGTTAAAGTTGCAGATAACGGAGCAACAATAATCGAGGGTATGGA 
T G C AG AT AAAG C AGAG AAAC GAAAAG AAG T T T T GAAT G C C C AAT AT C C AA 
AAT C AG C T AT T TAT GAGGAT AC AAAAG AAAAT T AC C CAT T AGT T AAT GT A 
GAG G GT T C C AAAGT T GGT GAAC AAT AC AAAG CAT T GAAT C C AAT AAAT GG 
AAAAG AT G G T C GAAG AGAGAT T G C T G AAG G T T G GT TAT C AAAAAAAAT T A 
C AGGGGT C AAT G AT CT CG AT AAG AAT AAAT AT AAAAT T GAAT T AAC T GT T 
GAG G GT AAAAC C ACT G T T GAAAC GAAAG AAC T T AAT C AAC CAC TAG AT GT 
CGT T GT G CT AT T AGAT AAT T C AAAT AG TAT GAAT AAT G AAAG AGC C AAT A 
AT T C T C AAAGAG CAT T AAAAG CT GGGGAAG C AGT T GAAAAG CT G AT T GAT 
AAAAT T AC AT C AAAT AAAGAC AAT AGAGT AG C T C T T G T GAC AT AT G C C T C 
AACCATTTTTGATGGTACTGAAGCGACCGTATCAAAGGGAGTTGCCGATC 
AAAAT GGT AAAG C GC T GAAT GAT AGT GT AT CAT GGG ATT AT C AT AAAACT 
AC T T T T AC AG C AAC T AC AC AT AAT T AC AGT TAT T T AAAT T T AAC AAAT G A 
TGCTAACGAAGTT AAT ATT CT AAAGT CAAG AAT TCCAAAGGAAGCGG AGC 
ATATAAATGGGGATCGCACGCTCTATCAATTTGGTGCGACATTTACTCAA 
AAAG C T C T AAT G AAAG C AAAT G AAAT T T T AG AGAC AC AAAG T T CT AAT GC 
TAGAAAAAAACTTATTTTTCACGTAACTGATGGTGTCCCTACGATGTCTT 
AT G C CAT AAAT T T T AAT CCT TAT AT AT C AAC AT CT T AC C AAAAC C AGT T T 
AAT T C T T T T T T AAAT AAAAT ACC AG AT AG AAGT GG TAT T C T C CAAG AG G A 
TT T TAT AAT C AAT GGT GAT GAT TAT C AAAT AG T AAAAG GAG AT G GAG AG A 
GTTTTAAACTGTTTTCGGATAGAAAAGTTCCTGTTACTGGAGGAACGACA 
CAAG C AG C T TAT C G AGT AC C G C AAAAT C AACT C T C T G T AAT G AG T AAT G A 
GGG AT AT G C AAT T AAT AGT G GAT AT AT T T AT C T C T AT T GG AG AG AT T AC A 
ACTGGGTCTATCC ATT TGAT CCT AAGACAAAGAAAGTTTCTGCAACGAAA 
C AAAT C AAAAC T CAT G GT G AG C C AAC AAC AT TAT ACT T T AAT GGAAAT AT 
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AAGACCTAAAGGTTATGACATTTTTACTGTTGGGATTGGTGTAAACGGAG 
ATCCTGGTGCAACTCCTCTTGAAGCTGAGAAATTTATGCAATCAATATCA 
AG T AAAAC AGAAAAT TAT ACT AAT G T T GAT G AT AC AAAT AAAAT T TAT GA 
T GAG C T AAAT AAAT ACT T T AAAAC AAT T GT T GAG GAAAAAC AT T CT AT T G 
T T GAT GG AAAT GT G AC T GAT CC T AT G G G AG AGAT GAT T GAAT T CC AAT T A 
AAAAAT GGT C AAAGT T T TAG AC AT GAT GAT T AC GT T T T GGT T GG AAAT G A 
T GG C AG T C AAT T AAAAAAT GGTGTGGCTCTTGGTG G ACC AAAC AGT G AT G 
GGG GAAT T TT AAAAGAT GT T AC AGT G AC T TAT G AT AAGAC AT C T C AAAC C 
ATCAAAATCAATCATTTGAACTTAGGAAGTGGACAAAAAGTAGTTCTTAC 
CTATGATGTACGTTTAAAAGATAACTATATAAGTAACAAATTTTACAATA 
CAAATAATCGTACAACGCTAAGTCCGAAGAGTGAAAAAGAACCAAATACT 
ATTCGTGATTTCCCAATTCCCAAAATTCGTGATGTTCGTGAGTTTCCGGT 
AC T AAC CAT C AGT AAT C AAAAGAAAAT G GGT GAGGT T GAAT T T AT T AAAG 
TT AAT AAAGACAAACATT C AGAAT CGCTT T TGGGAG CTAAGTT T CAACT T 
C AG AT AAAAAAAG AT T T T T CT GGGT AT AAG C AAT T T G T T C C AG AGG G AAG 
T GAT GT TACAACAAAGAAT GAT GGT AAAATTTATTTT AAAG CACTT CAAG 
AT GG T AACT AT AAAT T AT AT GAAAT T T C AAGT C C AG AT G GC TAT AT AG AG 
GT T AAAAC G AAAC CT GT T GT G AC AT T T AC AAT T C AAAAT G G AG AAGT T AC 
GAAC C T G AAAG C AG AT C C AAAT GC T AAT AAAAAT C AAAT C G GGT AT C T T G 
AA 

SEQ ID NO. 4110: SAG0649 FROM 2603 V/R GBS TYPE V STRAIN 

MKKRQKIWRGLSVTLLILSQIPFGILVQGETQDTNQALGKVIVKKTGDNATPLGKATFVL 
KNDNDKSETSHETVEGSGEATFENIKPGDYTLREETAPIGYKKTDKTWKVKVADNGATII 
EGMDADKAEKRKEVLNAQYPKSAIYEDTKENYPLVNVEGSKVGEQYKALNPINGKDGRRE 
IAEGWLSKKITGVNDLDKNKYKIELTVEGKTTVETKELNQPLDVVVLLDNSNSMNNERAN 
NSQRALKAGEAVEKLIDKITSNKDNRVALVTYASTIFDGTEATVSKGVADQNGKALNDSV 
SWDYHKTTFTATTHNYSYLNLTNDANEVNILKSRIPKEAEHINGDRTLYQFGATFTQKAL 
MKANEILETQSSNARKKLIFHVTDGVPTMSYAINFNPYISTSYQNQFNSFLNKIPDRSGI 
LQEDFIINGDDYQIVKGDGESFKLFS DRKVPVTGGTTQAAYRVPQNQLSVMSNEGYAINS 
GYIYLYWRDYNWVYPFDPKTKKVSATKQIKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNG 
DPGATPLEAEKFMQSISSKTENYTNVDDTNKIYDELNKYFKTIVEEKHSIVDGNVTDPMG 
EMI E FQLKNGQS FTH D DYVLVGN DG S QLKNGVALGG PN S DGG I LK DVT VT Y DKT S QT I K I 
NHLNLGSGQKWLTYDVRLKDNYISNKFYNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVR 
EFPVLTISNQKKMGEVEFIKVNKDKHSESLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKN 
DGK I Y FKALQDGN YKL YE I S S P DG Y I E VKT KPWT FT I QNGE VTNLKAD PNANKNQ I G YL 
EGNGKHLITNTPKRPPGVFPKTGGIGTIVYILVGSTFMILTICSFRRKQL 

SEQ ID NO. 4111: SAG0649 FROM 090 GBS TYPE la STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLKNDNDKSETSHETVEGSGEATFENIKPG 
DYT LREETAP IG YKKT DKT WKVKVADNGAT 1 1 EGMDADKAE KRKEVLN AQYPKS AI YE DT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NI LKSRI PKEAEHINGDRTLYQFGAT FTQKALMKANE I LETQS SNARKKLI FHVT DGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNGDPGATPLEAEKFMQSISSKTENYTNVDD 
TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKVVLTYDVRLKDNYISNKF 
YNTNNRTTLS PKSEKE PNT IRDFP I PKIRDVRE FPVLT I SNQKKMGE VE FIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4112: SAG0649 FROM A909 GBS TYPE la STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKAT FVLKNDNDKSETSHETVEGSGEATFENIKPG 
DYTLREETAPIGYKKTDKTWKVBCVADNGATIIEGMDADKAEKRKEVLNAQYPKSAIYEDT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDWVLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANE I LETQS SNARKKLI FHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNGDPGATPLEAEKFMQSISSKTENYTNVDD 
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TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKWLTYDVRLKDNYISNKF 
YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4113: SAG0649 PROM 18RS21 GBS TYPE II STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLKNDNDKSETSHETVEGSGEATFENIKPG 
DYTLREETAPIGYKKTDKTWKVKVADNGATIIEGMDADKAEKRKEVLNAQYPKSAIYEDT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGE PTTLYFNGNIRPKGYDI FTVGIGVNGDPGATPLEAEKFMQS I S SKTENYTNVDD 
TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKWLTYDVRLKDNYISNKF 
YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4114: SAG0649 FROM M732 GBS TYPE III STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLKNDNDKSETSHETVEGSGEAT FENIKPG 
DYTLREETAPIGYKKTDKTWKVKVADNGATIIEGMDADKAEKRKEVLNAQYPKSAIYEDT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKNTGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVVVLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGEPTTLYFNGNIRPKGYDI FTVGIGVNGDPGATPLEAEKFMQS IS SKTENYTNVDD 
T NK I Y D E LN K Y FKT I VE E KH S I V D GN VT D PMGEM I E FQLKN G Q S FT H D D Y VL VGN D G S Q L 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKVVLTYDVRLKDNYISNKF 
YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIPCVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWT FT I QNGE VTN LKAD PN ANKNQ I G YLEGNGKHL I TNT PKR P PG V 

SEQ ID NO. 4115: SAG0649 FROM COH1 GBS TYPE III STRAIN 

GE T QDTNQ ALGKVI VKKTGDN AT PLGKATFVLKNDNDKSETSHETVEGSGX AT FENIKPG 
D YT LREE T AP I G YKKT DKTWKVKVADNGAT HE GMDADKAE KRKE VLNAQY PKS AI YE DT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKNTGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMPCANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGE PTTLYFNGNIRPKGYDI FTVGIGVNGDPGATPLEAEKFMQS IS SKTENYTNVDD 
TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKWLTYDVRLKDNYISNKF 
YNTNNRTTLS PKSEKE PNTIRDFP I PKIRDVRE FPVLT I SNQKKMGE VE FIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWT FT IQNGEVTNLKADPNANKNQIGYLEGNGKHLITNT PKRP PGV 

SEQ ID NO. 4115: SAG0649 FROM M781 GBS TYPE III STRAIN 

GKVIVKKTGDTATPLGKATFVLKNDNDKSETSHETVEGSGKATFENIKPGDYTLREETAP. 
IGYKKTDKTWKVKVADNGAXIIEGMDADPCAEKRKEVLNAQYPKSAIYEDTKENYPLVNVE 
GSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVEGKTTVETKEL 
NQPLDVVVLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVALVTYASTIFD 
GTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEVNILKSRIPKE 
AEHINGDRTLYQFGAT FTQKALMBCANEILETQSSNARKKLIFHVTDGVPTMSYAINFNPY 
ISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDRKVPVTGGTTQ 
AAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKECVSATKQ IKTHGE PTTL 
YFNGNIRPKGYDI FTVGIGVNGDPGATPLEAEKFMQS I SSKTENYTNVDDTNKIYDELNK 
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YFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQLKNGVALGGPN 
SDGGILKDVTVTYDKTSQTIKINHLNLGSGQKWLTYDVRLKDNYISNKFYNTNNRTTLS 
PKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSESLLGAKFQLQ 
IEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEVKTKPWTFTI 
QNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4117: SAG0649 FROM CJB110 GBS NONTYPEABLE STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLKNDNDKSETSHETVEGSGXATFENIKPG 
D YT LREET AP I G YKKT DKT WKVKVADNG AT HE GMDADKAEKRKE VLN AQY PK S AI YE DT 

KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDKNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALPCAGEAVEKLIDKITSNKDNRVA 
L VT YAS TIF D GT E AT V S KG VAD QN GKALN D S VS W D YHKT T FT AT T HN Y S Y LN L TN D AN E V 

NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNGDPGATPLEAEKFMQSISSKTENYTNVDD 
TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPN S DGGI LKD VT VT YDKT SQT IKINHLNLGS GQKVVLT YDVRLKDN YI SNKF 
YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 
SLLGAKFQLQIEKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLEGNGKHLITNTPKRPPGV 

SEQ ID NO. 4118: SAG0649 FROM JM9130013 GBS TYPE VIII STRAIN 

GETQDTNQALGKVIVKKTGDNATPLGKATFVLPCNDNDKSETSHETVEGSGEATFENIKPG 
DYTLREETAPIGYKKTDKTWKVKVADNGATIIEGMDADKAEKRKEVLNAQYPKSAIYEDT 
KENYPLVNVEGSKVGEQYKALNPINGKDGRREIAEGWLSKKITGVNDLDPCNKYKIELTVE 
GKTTVETKELNQPLDVWLLDNSNSMNNERANNSQRALKAGEAVEKLIDKITSNKDNRVA 
LVTYASTIFDGTEATVSKGVADQNGKALNDSVSWDYHKTTFTATTHNYSYLNLTNDANEV 
NILKSRIPKEAEHINGDRTLYQFGATFTQKALMKANEILETQSSNARKKLIFHVTDGVPT 
MSYAINFNPYISTSYQNQFNSFLNKIPDRSGILQEDFIINGDDYQIVKGDGESFKLFSDR 
KVPVTGGTTQAAYRVPQNQLSVMSNEGYAINSGYIYLYWRDYNWVYPFDPKTKKVSATKQ 
IKTHGEPTTLYFNGNIRPKGYDIFTVGIGVNGDPGATPLEAEKFMQSISSKTENYTNVDD 
TNKIYDELNKYFKTIVEEKHSIVDGNVTDPMGEMIEFQLKNGQSFTHDDYVLVGNDGSQL 
KNGVALGGPNSDGGILKDVTVTYDKTSQTIKINHLNLGSGQKVVLTYDVRLKDNYISNKF 
YNTNNRTTLSPKSEKEPNTIRDFPIPKIRDVREFPVLTISNQKKMGEVEFIKVNKDKHSE 
SLLGAKFQLQIKKDFSGYKQFVPEGSDVTTKNDGKIYFKALQDGNYKLYEISSPDGYIEV 
KTKPWTFTIQNGEVTNLKADPNANKNQIGYLE 

SEQ ID NO. 4201: 2603 V/R STRAIN 

ATGGTAAAATTAGTATTCGCACGCCACGGTGAATCTGAGTGGAATAAAGCTAACCTTTTC 
ACT GGAT G GG C T G AC G TAG AT C T T T C AG AAAAAG GT AC AC AAC AAG C T AT T GAT G CT G GG 
AAATTAATTCAAGCAGCAGGTATTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGT 
GCCATCAAAACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAA 
AAAT CAT GG C G CT T G AAC G AAC G T CAT TAG G G T G GAT T G AC AG G AAAAAAT AAAG C AG AA 
G C AG C T G AAC AAT T T G GT GAT GAG C AAG TT CAT AT TTGGCGTCGTT CAT AT GAT GT AT T G 
C CT C C AGAT AT G G C T AAAG AT GAT G AAC AT T C AG C AC AT AC T GAT C GT C G C T AT G CT T C A 
C TAG AT GAT T C T GT T AT T C C AG AT G C AGAAAAC CT AAAAGT T AC T T T AGAG CGT G C T CT T 
CCTTTCTGGGAAGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGT 
G C AC AC G GT AAC T C AAT CCGTGCTCTT GT AAAAC AT AT C AAAC AAT T G T C AGAT GAT GAA 

ATCATGGACGTTGAAATTCCTAACTTCCCACCACTTGTTTTCGAATTTGATGAAAAATTA 
AACCTTGTTTCAGAATATTACTTAGGTAAA 

SEQ ID NO. 4202: 090 STRAIN 

G T AAAAT T AGT AT T C G C AC G C C AC G GT G AAT C TG AGT G 

GAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGATCTTTCAGAAA 

AAGGT AC AC AAC AAGCT AT T GAT G C T GG G AAAT T AAT T C AAG C AGC AGGT 

ATTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAAAC 

AACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAAA 

AATCATGGCGCTTGAACGAACGTCATTACGGTGGATTGACAGGAAAAAAT 

AAAG C AG AAG C AG C T G AAC AAT T T GG T GAT G AG C AAG T T CAT AT T T G GC G 

T C GT T CAT AT GAT G T AT T GC C T C C AG AT AT G G C T AAAG AT GAT G AAC AT T 

CAGCACATACTGATCGTCGCTATGCTTCACTAGATGATTCTGTTATTCCA 

GATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTCTGGGA 

AGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGTG 
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C AC AC G G T AACT C AAT CCGTGCTCT T GT AAAAC AT AT C AAAC AAT T GT C A 
GAT GAT G AAAT CAT G G ACGT T GAAAT T C CT AACT T CC C AC C ACT T GT T T T 
C GAAT T T GAT GAAAAAT T AAAC C T T GT T T C AG AAT AT T AC T T AGGT AAA 

SEQ ID NO. 4203: A909 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCTGAGTGG 

AATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGATCTTTCAGAAAA 

AGGTACACAACAAGCTATTGATGCTGGGAAATTAATTCAAGCAGCAGGTA 

TTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAAACA 

ACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAAAA 

ATCATGGCGCTTAAACGAACGTCATTACGGTGGATTGACAGGAAAAAATA 

AAG C AGAAGC AG C T G AAC AAT T T G GT GAT G AGC AAG TT CAT AT T T GG C GT 

C GT T CAT AT G AT GT AT T G C C T C C AG AT AT GG CTAAAGAT GAT G AAC AT T C 

AGCACATACTGATCGTCGCTATGCTTCACTAGATGATTCTGTTATTCCAG 

ATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTCTGGGAA 

GATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGTGC 

AC ACGGT AAC T C AAT C C GT G CT CT TG T AAAAC AT AT C AAAC AAT T GT C AG 

ATGATGAAATCATGGACGTTGAAATTCCTAACTTCCCACCACTTGTTTTC 

GAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGGTAAA 

SEQ ID NO. 4204: H36B STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCTGAG 

TGGAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGATCTTTCAGA 
AAAAGGT AC AC AAC AAG CT AT T GAT G C T GGG AAAT T AAT T C AAGC AG C AG 
GTATTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAA 
ACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGA 
AAAAT C AT G G C G C T T G AAC G AACG T CAT T ACG GT GG AT T G AC AG GAAAAA 
AT AAAGC AG AAG C AG C T GAAC AAT T T G G T GAT GAGC AAG T T CAT AT T T GG 
C GT C G T T CAT AT GAT G T AT T GC CT C CAGAT AT GG C T AAAG AT GAT G AAC A 
TTCAGCACATACTGATCGTCGCTATGCTTCACTAGATGATTCTGTTATTC 
C AG AT GC AG AAAAC CT AAAAGT T AC T T TAG AG CGTGCTCTTCCTTTCTGG 
GAAGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGG 
TGCACACGGTAACTCAATCCGTGCTCTTGTAAAACATATCAAACAATTGT 
CAGATGATGAAATCATGGACGTTGAAATTCCTAACTTCCCACCACTTGTT 
TTCGAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGGTAA 
A 

SEQ ID NO. 4205: 18RS21 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCTGAGTGG 

AAT AAAGC T AAC CT T T T C AC T GG AT GG GC T G AC GT AGAT C T T T C AG AAAA 

AGGTACACAACAAGCTATTGATGCTGGGAAATTAATTCAAGCAGCAGGTA 

fTTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAAACA 

ACT AACCTTGCCCTTGAAGCAGCTG AT C AACT TTGGGTACCAGTTG AAAA 

AT C AT GGCGCT T GAAC GAAC GT C AT T AC GGT G GAT T G AC AG GAAAAAAT A 

AAG C AGAAG C AG C T GAAC AAT T T GGT G AT GAG C AAGT T CAT AT T T G G C GT 

C GT T CAT AT G AT GT AT T GC C T C CAGAT AT G G C T AAAG ATGAT GAAC AT T C 

AGCACATACTGATCGTCGCTATGCTTCACTAGATGATTCTGTTATTCCAG 

ATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTCTGGGAA 

GATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGTGC 

ACACGGTAACTCAATCCGTGCTCTTGTAAAACATATCAAACAATTGTCAG 

ATG ATG AAAT C AT GG ACGT TGAAATTCCTAACTTCCCACC ACT TGTTTTC 

GAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGGTAAA 

SEQ ID NO. 4206: M732 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCTGAGTGG 

AAT AAAGC T AAC CT T T T C ACT G GAT G GG C T G ACGT AG AT C T T T C AGAAAA 

AGGT AC AC AAC AAG C T AT T GAT GC T GGG AAAT TAATT C AAG C AG C AG GT A 

TTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACGTGCCATCAAAACA 

ACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGTTGAAAA 

AT CAT G G C G CT T GAAC G AAC GT CAT TAG GGT GG AT T G AC AG GAAAAAAT A 

AAG C AG AAG C AGC T GAAC AAT T T GGT GAT G AG C AAGT T CAT AT T T G G CG T 

C GT T CAT AT GAT GT AT T G C C T C CAGAT AT GG C T AAAG AT GAT GAAC AT T C 

AG C AC AT AC T GAT C G T C G C T AT G C T T C AC TAG AT GAT T C T GT TAT T C C AG 

ATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTCTGGGAA 
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GATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGTGTTTGTTGGTGC 
AC ACGGT AACT C AAT CCGTGCTCTT GT AAAAC AT AT C AAAC AAT T GT C AG 
ATGATGAAATCATGGACGTTGAAATTCCTAACTTCCCACCACTTGTTTTC 
GAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGGTAAA 

SEQ ID NO. 4207: COH1 STRAIN 

GTAAAATTAGTATTCGCACGCCACGG 

T G AAT C T G AGT GGAAT AAAG CT AAC C T T T T C AC T GG AT GGG C T G AC GT AG 
AT C T T T C AGAAAAAG GT AC AC AAC AAG C TAT T G AT GC T GGGAAAT T AAT T 
CAAGCAGCAGGTATTGAGTTCGACCTTGCTTTTACATCAGTTCTTAAACG 
T GC C AT C AAAAC AACT AAC CTTGCCCTT G AAG C AGCT G AT C AACT T T G G G 
T AC C AGT T G AAAAAT CAT GG C GC T T GAACGAAC GT C AT T AC GGT G GAT T G 
AC AGG AAAAAAT AAAG C AGAAGC AG CT G AACAAT T T GG T GAT GAG C AAGT 
T CAT AT TTGGCGTCGTT CAT AT GAT GT AT T GC CT C C AG AT AT GG CT AAAG 
ATGATGAACATTCAGCACATACTGATCGTCGCTATGCTTCACTAGATGAT 
TCTGTTATTCCAGATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCT 
TCCTTTCTGGGAAGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATG 
TGTTTGTTGGTGCACACGGTAACTCAATCCGTGCTCTTGTAAAACATATC 
AAAC AAT T GT C AGAT GAT G AAAT CAT G GAC GT T G AAAT T C C T AAC T T C C C 
ACCACTTGTTTTCGAATTTGATGAAAAATTAAACCTTGTTTCAGAATATT 
ACTTAGGTAAA 

SEQ ID NO. 4208: CJB110 STRAIN 

GTAAAATTAGTATTCGCACGCCACGG 

TGAATCTGAGTGGAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAG 
AT CT T T C AGAAAAAGGT AC AC AAC AAG CT AT T GAT G CT G G G AAAT T AAT T 
C AAG C AG C AGGT AT T GAG T T C GAC CTTGCTTT T AC AT C AGT T C T T AAAC G 
TGCCATCAAAACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGG 
T AC C AGT T GAAAAAT C AT GG C GC T T G AAC G AACGT CAT T AC GGT G GAT T G 
AC AGGAAAAAAT AAAG C AG AAG C AG C T GAAC AAT T T GGT GAT GAG C AAGT 
TCATATTTGGCGTCGTTCATATGATGTATTGCCTCCAGATATGGCTAAAG 
AT GAT GAAC AT T C AG C AC AT ACT GAT C G T C G C TAT G C T T C AC TAG AT GAT 
TCTGTTATTCCAGATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCT 
TCCTTTCTGGGAAGATAAAATTGCTCCTGCTCTTAAAGAT GGT AAAAAT G 
TGTTTGTTGGTGC AC ACGGT AACT C AAT CCGTGCTCTTGT AAAAC AT AT C 
AAAC AAT T GT C AG AT GAT GAAAT CAT GG AC GT T G AAAT T C CT AAC T T C C C 
AC C AC T T GT T T T CGAAT T T GAT GAAAAAT T AAAC C T T GT T T C AG AAT AT T 
ACTTAGGTAAA 

SEQ ID NO. 4209: 1169NT STRAIN 

AGTATTCGCACGCCACGGTGAATCTGAGTGGAATAAAGCTAACCTTTTCA 
C T GGAT G GGC T G AC GT AG AT C T T T C AG AAAAAGG T AC AC AAC AAG C TAT T 
GATGCTGGGAAATTAATTCAAGCAGCAGGTATTGAGTTCGACCTTGCTTT 
TAG AT C AGT T C T T AAAC GT GC C AT C AAAAC AAC TAACC T T G C C C T T G AAG 
CAGCT GAT CAACT T T GGGTACCAGTT GAAAAAT CATGGCGCT T GAACGAA 
C GT CAT T ACGGT G GAT T GAC AGG AAAAAAT AAAG C AGAAG C AG C T GAAC A 
ATTTGGTGATGAGCAAGTTCATATTTGGCGTCGTTCATATGATGTATTGC 
CT C C AG AT AT G G C T AAAG AT GAT GAAC AT T C AG C AC AT AC T GAT C GT C G C 
TAT G CT T C ACT AG AT GAT T C T GT T AT T C C AGAT G C AG AAAAC C T AAAAGT 
TACTTTAGAGCGTGCTCTTCCTTTCTGGGAAGATAAAATTGCTCCTGCTC 
TTAAAGATGGT AAAAAT GTGTTTGTT GGT GCAC ACGGT AACT C AAT CCGT 
GCT CTT GT AAAACAT AT CAAAC AAT T GT CAGATG ATGAAAT C AT GGACGT 
T G AAATT C CT AACT TCCCACC ACT TGTTTTCG AAT TTGATGAAAAATTAA 
ACCTTGTTTC AG AAT ATT ACTTAGGTAAA 

SEQ ID NO. 4210: M781 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGT 

GAATCTGAGTGGAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGA 
TCTTTCAGAAAAAGGTACACAACAAGCTATTGATGCTGGGAAATTAATTC 
AAG C AG C AG GT AT T G AGT T C GAC CTTGCTTT T AC AT C AGT T CT T AAAC GT 
GCCATCAAAACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGT 
AC C AGT T GAAAAAT CAT G G C G C T T GAAC G AACGT C AT T AC G GT GG AT T G A 
C AG G AAAAAAT AAAG C AG AAG C AG C T GAAC AAT T T G GT GAT GAG C AAGT T 
CATATTTGGCGTCGTTCATATGATGTATTGCCTCCAGATATGGCTAAAGA 
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T GAT GAAC AT T C AGC AC AT ACT GAT C GT CG CT AT G C T T C AC TAG AT GAT T 
CTGTTATTCCAGATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTT 
CCTTTCTGGGAAGATAAAATTGCTCCTGCTCTTAAAGATGGTAAAAATGT 
GTTTGTTGGTGCACACGGTAACTCAATCCGTGCTCTTGTAAAACATATCA 
, AAC AAT T GT C AG AT GAT G AAAT CAT GG ACGT TGAAAT T C CT AACT T C C C A 
CCACTTGTTTTCGAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTA 
C T T AG GT AAA 

SEQ ID NO. 4211: JM930013 STRAIN 

GTAAAATTAGTATTCGCACGCCACGGTGAATCT 

GAGTGGAATAAAGCTAACCTTTTCACTGGATGGGCTGACGTAGATCTTTC 
AG AAAAAG GT AC AC AAC AAG C TAT T GAT GC T G GGAAAT T AAT T C AAG C AG 
C AGGT AT T G AGT T CG AC CT T G C T T T T AC AT C AGT T C T T AAAC GT G C CAT C 
AAAACAACTAACCTTGCCCTTGAAGCAGCTGATCAACTTTGGGTACCAGT 
T G AAAAAT CAT G GC G C T T GAAC GAAC GT CAT T AC G GT GG AT T G AC AGG AA 
, AAAAT AAAG C AG AAG C AG C T GAAC AAT T T G GT G AT GAG C AAGT T CAT AT T 
TGGCGTCGTT CAT AT G AT GT AT T G C CT C C AG AT AT G G CT AAAG AT GAT GA 
AC AT T C AG C AC AT AC T GAT C G T CG C TAT G CT T C AC TAG AT GAT T CT GT T A 
TTCCAGATGCAGAAAACCTAAAAGTTACTTTAGAGCGTGCTCTTCCTTTC 
TGGG AAG AT AAAATTGCTCCTGCTCTT AAAG ATGGT AAAAAT GTGTTTGT 
T GGT G C AC AC GGT AAC T C AAT CCGTGCTCT T GT AAAAC AT AT C AAAC AAT 
TGTCAGATGATGAAATCATGGACGTTGAAATTCCTAACTTCCCACCACTT 
GTTTTCGAATTTGATGAAAAATTAAACCTTGTTTCAGAATATTACTTAGG 
TAAA 

SEQ ID NO. 4212: 2603 V/R STRAIN 

VKL VFARHGE SE WNKANL FTGWADVDL SEKGTQQAI DAGKL I QAAG IE FDLAFT S VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4213: 090 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTSVLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4214: A909 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAI DAGKLIQAAGIE FDLAFT S VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGN S I RAL VKH I KQL S D DE I M D VE I PN F P P L V FE FDE K LN L V S E Y YL G K 

SEQ ID NO. 4215: H36B STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTS VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4216: 18RS21 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAI DAGKLIQAAGIE FDLAFTS VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGN S I RAL VKH I KQL S DDEIMDVE I PNFP PLVFEFDEKLNLVSE YYLGK 

SEQ ID NO. 4217: M732 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTS VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSE YYLGK 

SEQ ID NO. 4218: COHl STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAI DAGKLIQAAGIE FDLAFTS VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
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PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGN S I RAL VKH I KQL S D DE I MD VE I PN F P P L V FE F DE KLN L V S E Y Y L GK 

SEQ ID NO. 4219: CJB110 STRAIN 

VKLV FARHGE S EWNKANL FTGWAD VDL S EKGTQQAI DAGKL I QAAG IE FDLAFT S VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4220: 1169NT STRAIN 

VFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTSVLKRAIKT 
TNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLPPDM 
AKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGAHGN 
SIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4221: M781 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTS VLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVGA 
HGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKLNLVSEYYLGK 

SEQ ID NO. 4222: JM9130013 STRAIN 

VKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTSVLKRA 
IKTTNLALEAADQLWVPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVLP 
PDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWE DKIAPALKDGKNVFVGA 
HGNS IRALVKHIKQLS DDE IMDVE I PNFPPLVFE FDEKLNLVSE Y YLGK 
SEQ ID NO. 4301: 2603 V/R STRAIN 

ATGAATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATC 
GTTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCT 
AATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCT 
GAT G AAGT AAC AAAC GGGAT T GT AAAAGAG C G C T TAG C T G AGG AT GAT AT C G C AGAAAAA 
GGTTTTTTACTT GAT GG AT AT C C AC GT ACT AT T G AAC AAG C AC AC G C C T T AGAT G CT ACG 
C T T G AAG AAC T AGG AC T AC G CT T AG AT G GT G T T AT T AAT AT T AAAG T G GAT C CAT CAT GT 
C T TAT AG AG C GT T T GAG T G kT C GT AT TAT C AAT CGT AAAAC T G GT G AAAC T T T C C AC AAA 
GTGTTCAACCCACCAGTAGATTATAAAGAAGAAGATTACTATCAACGTGAAGATGATAAG 
CCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAA 
C AC TAT C GT AAG CTTGGTCT T GT T AC AG AT AT T G AAGG T AAT C AAG AAAT AAC AGAAG T T 
TTTGCAGATGTTGAAAAAGCGTTGCTAGAACTCAAA 

SEQ ID NO. 4302: 090 STRAIN (reverse complement) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCA 

AGCAGCTAAGATCGTTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCG 
CGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGG 
T G AAT TGGTTCCT GAT G AAGT AAC AAAC GG GAT T G T AAAAGAG C G C T T AGC T G AGG AT G A 
TATCGCAGAAAAAGGTTTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGC 
CT T AGAT G C T AC GC T T GAAG AAC T AG G AC T AC G C T T AGAT GGT G T T AT T AAT AT T AAAG T 
G GAT C CAT C AT GT C T TAT AG AG C GT T T G AGT G G T C GT AT TAT C AAT C GT AAAAC T GGT G A 
AAC T T T C C AC AAAG T G T T C AAC C C AC C AG TAG AT TAT AAAG AAG AAG AT T AC TAT C AAC G 
T GAAG AT GAT AAG C C T G AAAC T GT C AAACGT C G C T T GGAC GT T AAT AT T G C T C AAGG AG A 
AC C T AT T C T T G AAC AC TAT CGT AAG CTTGGTCTTGT T AC AG AT AT T G AAGGT AAT C AAGA 
AAT AAC AGAAGT T T T T G C AGAT GT T G AAAAAG CG T T G 

SEQ ID NO. 4303: 1169NT STRAIN (REVERSE COMPLEMENT) 

TGGTAAAGGGACTCAAGCAGCTAAGATTGTTGAAGAATTTGGTGTTGCGCACATCTCAAC 
AGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAG 
T TAT AT T GAT AAAG GT G AAT TGGTTCCT GAT C AAGT AAC AAAC G G GAT T GT AAAAGAG C G 
C T TAG C T G AGG AT GAT AT C G C AG AAAAAGGT T T T T T AC T T G AT GGG TAT C C AC GT AC TAT 
TGAACAAGCACACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGT 
TATTAATATTAAAGTGGATCCATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAA 
TCGTAAAACTGGTGAAACTTTCCACAAAGTGTTCAACCCACCAGTAGATTATAAAGAAGA 
AGAT T AC TAT C AACGT GAAG AT GAT AAG C CT G AAAC T GT C AAAC G T C G C T T GGAC G T T C A 
TAT T G C T C AAGG AG AAC CT AT T C T T G AAC AC TAT AG T AAG CTTGGCCTTGT T AC AG AT AT 
T GAAG GT AAT C AAG AAAT AA 
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SEQ ID NO. 4304: 18RS21 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAACCACGGGTTCGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATCG 
TTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTA 
ATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTG 
ATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCGCAGAAAAAG 
G T T T T T T AC T T GAT GG AT AT C C ACGT ACT AT T G AAC AAG C AC AC G C CT TAG AT G C T AC G C 
T T GAAG AAC T AGG AC T AC G C T T AGAT GGT GT T AT T AAT AT T AAAGT GG AT C CAT CAT GT C 
T TAT AG AG C GT T T G AGT G GT C GT AT TAT C AAT C G T AAAACT G G T G AAACT T T C C AC AAAG 
T GT T C AAC C C AC C AGT AGAT T AT AAAG AAGAAG AT T ACT AT C AAC GT GAAG AT GAT AAG C 
CTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAACCTATTCTTGAAC 
AC TAT CG T AAG CTTGGTCTTG T T AC AG AT AT T G AAG GT AAT C AAG AAAT AAC AG AAGT T T 
TTGCAGATGTTGAAAAAGCGTTG 

SEQ ID NO. 4305: A909 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAG 

C T AAG AT CGT T GAAG AAT TTGGTGTTGCT C AC AT C T C AAC AG GGG AT AT GTTCCGCGCCG 

C AAT G G C T AAT C AAAC CG AAAT GG GACGT T T AG C T AAAAGT T AT AT T G AT AAAGGT GAAT 

TGGTTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGATATCG 

CAGAAAAAGGTTTTTTACTTGATGGATATCCACGTACTATTGAACAAGCACACGCCTTAG 

ATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATTAAAGTGGATC 

CATCATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTT 

T C C AC AAAGT GT T C AAC C C AC C AGT AG AT TAT AAAG AAGAAG AT TACT AT C AACGT GAAG 

ATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTAATATTGCTCAAGGAGAATCTA 

T T C T T G AAC AC TAT C G AAAG C TTGGTCTT GT T AC AG AT AT T G AAGG T AA 

SEQ ID NO. 4306: CJB110 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAACCACGGGTTTGCTTGGTGCTGGTAAAGGTACTCAAGCAGCTAA 
GATCGTTGAAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAAT 
GGCTAATCAAACCGAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGT 
T C CT G AT G AAG T AAC AAACG GG AT T GT AAAAG AG C G CT T AG CT GAGGAT GAT AT C G C AG A 
AAAAGG T T T T T TACT T GAT G GAT AT C C AC GT ACT AT T G AAC AAG C AC AC G C CT TAG AT G C 
T AC G C T T GAAG AAC TAG G AC T ACG C T TAG AT GGT GT TAT T AAT AT T AAAGT GG AT C CAT C 
ATGTCTTATAGAGCGTTTGAGTGGTCGTATTATCAATCGTAAAACTGGTGAAACTTTCCA 
C AAAG T G T T C AAC C C AC C AGT AG AT TAT AAAG AAGAAG AT T AC TAT C AAC GT GAAG AT G A 
T AAG C C T G AAAC T GT C AAACG T CG C T T GG AC G T T AAT AT T GC T C AAG GAG AAC CT AT T C T 
TGAACACTATAG 

SEQ ID NO. 4307: COH1 STRAIN (REVERSE COMPLEMENT) 

ATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTG 
AAGAATTTGGTGTTGCTCACATCTCAACAGGGGATATGTTCCGCGCCGCAATGGCTAATC 
AAAC C C AAAT G G GAC GT T TAG C T AAAAGT TAT AT T G AT AAAGG T GAAT T GG T T C CT G AT G 
AAGT AAC AAACG GG AT T GT AAAAGAG C G CT T AG CT GAGGAT GAT AT C G C AG AAAAAGGT T 
TTTTACTT GAT G GAT AT C C AC GT AC T AT T G AG C AAG C AC AC G C C T T AGAT G C T ACGC T T G 
AAG AAC T AGG AC T AC G C T TAG AT GG T GT T AT T AAT AT T AAAGT G GAT C C AAC AT GC CT T A 
TAGAGCGTTTGAGTGGC CGT ATTATCAAT CGT AAAACT GGT GAAACTTTCCACAAAGTGT 
T C AAC C C AC C AGT AGAT TAT AAAG AAG AAG AT TACT AT C AACG T GAAG AT GAT AAG C CT G 
AAAC T GT C AAAC GT C G C T T G G ACGT T AAT AT T G CT C AAGGAG AAC C T AT T CT T G AAC AC T 
AT C GT AAG CTTGGTCTTGT T AC AGAT AT T G AAGGT AAT C AAG AAAT AAC AG AAG T T T T T G 
C AG AT GT T G AAAAAGCG T T G 

SEQ ID NO. 4308: H36B STRAIN (REVERSE COMPLEMENT) 

CAGGGGATATGTTCCGCGCCGCAATGGCTAATCAAACCGAAATGGGACGTTTAGCTAAAA 
GT T AT AT T GAT AAAGGT GAAT T GGT T C CT GAT G AAGT AAC AAAC G G GAT T GT AAAAG AG C 
G C T T AG C T GAG GAT GAT AT C GC AG AAAAAG GT T T T T TACT T GAT G GAT AT C C ACGT AC T A 
T T G AAC AAGC AC AC G C C T TAG AT G CT ACGCT T G AAG AACT AG GAC T AC G C T TAG AT GG T G 
T T AT T AAT AT T AAAGT G GAT C CAT C AT GT CT TAT AG AG C G T T T G AGT GGT CGT AT TAT C A 
AT C GT AAAAC T G GT G AAAC T T T C C AC AAAGT GT T C AAC C C AC C AGT AG AT TAT AAAG AAG 
AAGATTACTATCAACGTGAAGATGATAAGCCTGAAACTGTCAAACGTCGCTTGGACGTTA 
AT AT T G C T C AAG GAG AAT C T AT T C T T G AAC AC TAT C GT AAG CTTGGTCTTGT T AC AG AT A 
T T GAAG G T AAT C AAG AAAT AAC AG AAGT T T T T G C AG AT GT T G AAAAAG C G T TG 

SEQ ID NO. 4309: JM9130013 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGT 

ACT C AAG C AG CT AAG AT C G T T GAAG AAT TTGGTGTTGCT C AC AT C T C AAC AG G GG AT AT G 
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TTCCGCGCCG C AAT G GC T AAT C AAACCG AAAT GGGAC GT T T AGC T AAAAGT T AT AT T GAT 
AAAGGTGAATTGGTTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAG 
GAT GAT AT CG CAG AAAAAGGT T T T T T ACT T GAT GGAT AT C C AC GT AC T AT T GAAC AAG C A 

CACGCCTTAGATGCTACGCTTGAAGAACTAGGACTACGCTTAGATGGTGTTATTAATATT 
AAAGT GGAT CC AT C AT GT CT T AT AGAG CGT T T G AGT GGT CGT AT TAT C AAT C GT AAAAC T 
G GT G AAACT T T C C ACAAAGT G T T C AAC C C AC C AGT AG AT T AT AAAG AAG AAGAT T AC TAT 
CAACGTGAAGATGATAAGCCTGAAACTGTTAAACGTCGCTTGGACGTTAATATTGCTCAA 

GGAGAACCTATTCTTGAACACTATAAAAAGCTTGGTCTTGTTACAGATATTGAAGGTAAT 
CA 

SEQ ID NO. 4310: M732 STRAIN (REVERSE COMPLEMENT) 

CTTTTAATTATGGGTTTGCCTGGTGCTGGTAAAGGTACTCAAGCAGCTAAGATTGTTGAA 
G AAT TTGGTGTTGCT C AC AT C T CAAC AGGG GAT AT GT TCCGCGCCG C AAT GG C T AAT C AA 
ACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGTGAATTGGTTCCTGATGAA 
GT AAC AAAC GGG AT T GT AAAAGAG C G C T T AG CT G AGGAT GAT AT C G C AG AAAAAG G T T T T 
T T ACT T GAT G GAT AT C C AC GT AC T AT T GAG C AAG C AC AC G C C T TAG AT G C T AC GC T T G AA 
GAAC T AGG AC T AC G C T T AGAT GGT G T TAT T AAT AT T AAAGT G GAT C C AACAT G C C T TATA 
GAGCGTTTGAGTGGCCGTATTATCAATCGTAAAACTGGTGAAACTTTCCACAAAGTGTTC 
AAC C C AC C AGT AG AT TAT AAAG AAG AAGAT T AC TAT C AAC GT G AAG AT G AT AAGC CT G AA 
ACT GT C AAACGT C G C T T GG AC GT T AAT AT T GC T C AAG GAG AAC C TAT T C T T GAAC AC TAT 
C G T AAG CT T G GT C T T GT TAG AG AT AT T G AAGGT AAT C AAG AAAT AAC AGAAGT T T T T G C A 
GATGTTGAAAAAGCGTTG 

SEQ ID NO. 4311: M781 STRAIN (REVERSE COMPLEMENT) 

AATCTTTTAATTACGGGTTTGCCTGGTGCTGGTAAAGGTACTCAA 

G C AGC T AAG AT T GT T G AAG AAT TTGGTGTTGCT C AC AT CT CAAC AG G GG AT AT GT T C CGC 
GCCGCAATGGCTAATCAAACCCAAATGGGACGTTTAGCTAAAAGTTATATTGATAAAGGT 
GAATTGGTTCCTGATGAAGTAACAAACGGGATTGTAAAAGAGCGCTTAGCTGAGGATGAT 
AT C G C AG AAAAAGGTT T T T T AC T T GAT GGAT AT C C AC G T AC T AT T GAG C AAG C AC ACG C C 
T TAG AT G C T AC G C T T G AAGAAC T AG G AC TAG G CT TAG AT GG T GT T AT T AAT AT T AAAGT G 
GAT C CAAC AT G C CT TAT AG AGC G T T T GAG T G GC C GT AT TAT C AAT C G T AAAAC T GGT GAA 
AC T T T C C AC AAAGT GT T CAAC C C AC CAG TAG AT TAT AAAG AAG AAG AT T AC TAT C AACGT 
GAAG AT GAT AAG C CT G AAACT GT C AAAC GT CG CT T GG AC GT T AAT AT T G C T C AA 

SEQ ID NO. 4312: 2603 V/R STRAIN 

MNLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVP 
DEVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSC 
LIERLSXRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILE 
HYRKLGLVTDIEGNQEITEVFADVEKALLELK 

SEQ ID NO. 4313: 090 STRAIN 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVT DIEGNQEI TE V FAD VEKAL L>E LK 

SEQ ID NO. 4314: 1169NT STRAIN 

GKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPDQVTNGIVKER 
LAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIIN 
RKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVHIAQGEPILEHYSKLGLVTDI 
EGNQEI 

SEQ ID NO. 4315: 18RS21 STRAIN 

NLLTTGSPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YRKLGLVT DIEGNQEI TE VFADVEKALLE 

SEQ ID NO. 4316: A909 STRAIN 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGESILEH 
YRKLGLVTDIEG 
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SEQ ID NO. 4317: A909 STRAIN 

N L L I MGL P G AGKG T QAAK I VE E FG VAH 1ST G DM FRAAMAN QT EMGR LAK S Y I DKGE L V P D 

EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGESILEH 
YRKLGLVTDIEG 

SEQ ID NO. 4318: CJB110 STRAIN 

NLLTTGLLGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 

EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 

IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
Y 

SEQ ID NO. 4319: COHl STRAIN 

L L I MG L P G AGKGT Q AAK I VE E FG VAH I S T G DM FRAAMANQT QMGR LAK S Y I DKGE L V P DE 
VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 
ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVT D IEGNQE I TEVFADVEKALL 

SEQ ID NO. 4320: H36B STRAIN 

GDMFRAAMANQTEMGRLAKSYIDKGELVPDEVTNGIVKERLAEDDIAEKGFLLDGYPRTI 
EQAHALDATLEELGLRLDGVINIKVDPSCLIERLSGRIINRKTGETFHKVFNPPVDYKEE 
DYYQRE DDKPET VKRRL DVNIAQGE S I LEH YRKLGL VT D IEGNQE I TE VFAD VEKAL 

SEQ ID NO. 4321: JM9130013 STRAIN 

NLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEH 
YKKLGLVTDIEGN 

SEQ ID NO. 4322: M732 STRAIN 

LLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPDE 
VTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCLI 
ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILEHY 
RKLGLVT D IEGNQE I TEVFADVEKALLELK 

SEQ ID NO. 4323: M781 STRAIN 

NLLITGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTQMGRLAKSYIDKGELVPD 
EVTNGIVKERLAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPTCL 
IERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQ 

SEQ ID NO. 4401 
STRAIN 2603 

G T GG AT AAAC AT C ACT C AAAAAAGGC T AT T T T AAAGT T AAC A 

CT T AT AAC AACT AG T AT T T T AT T AAT G CAT AG C AAT C AAGT G AAT G C AG AGG AG C AAGAA 
T T AAAAAAC C AAGAG C AAT C AC CT GT AAT T G C T AAT G T T GC T C AAC AG C CAT C GC CAT C G 
GTAACTACTAATACTGTTGAAAAAACATCTGTAACAGCTGCTTCTGCTAGTAATACAGCG 
AAAGAAAT G G GT GAT AC AT C T GT AAAAAAT GAC AAAAC AGAAG AT G AAT TAT TAG AAGAG 
TTATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGATCTTGAAGAAGAATATCCCTCT 
AAAC C AG AG AC AAC C AAC AAT AAAG AAAG C AAT GTAGT AAC AAAT G CT T C AAC T G CAAT A 
G C AC AG AAAGT T C C C T C AGC AT AT GAAG AG GT G AAG C C AG AAAG C AAGT CAT CGCTTGCT 
G T T CT T GAT AC AT C T AAAAT AAC AAAAT T AC AAG C CAT AAC C C AAAGAG G AAAG G G AAAT 
GTAGTAGCTATTATTGATACTGGCTTTGATATTAACCATGATATTTTTCGTTTAGATAGC 
C C AAAAG AT GAT AAG C AC AG C T T T AAAAC T AAG AC AG AAT T T GAG GAAT T AAAAG C AAAA 
CAT AAT AT C AC T T AT GGG AAAT GGGT T AACG AT AAGATT G T T T T T G C AC AT AAC T AC G C C 
AAC AAT AC AG AAAC G G T GG C T GAT AT T G C AG C AG CT AT GAAAG AT GGT T AT G GT T C AGAA 
GCAAAGAATATTTCGCATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGT 
CCAGCAATCAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAATG 
CG T AT T C CAGAT AAAAT T GAT T C GG AC AAAT T T G G T GAAG CAT AT G C T AAAG CAAT C AC A 
G AC GC T GT T AAT C T AG G AG C AAAAAC GAT T AAT AT GAGT AT T G G AAAAAC AG C T GAT T C T 
T T AAT T G C T CT CAAT GAT AAAGT T AAAT TAG C ACT T AAAT TAG C T T CT G AG AAG GG C GT T 
GCAGTTGTTGTGGCTGCCGGAAATGAAGGCGCATTTGGTATGGATTATAGCAAACCATTA 
TCAACTAATCCTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATACTTTGAGT 
GTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCGTTGAAACAACTATTGAAGGT 
AAGT T AGT T AAGT T G C C GAT T GT G AC T T CT AAAC C T T T T G AC AAAGGT AAGGC C T ACG AT 
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GTGGTTTATGCCAATTATGGTGCAAAAAAAGACTTTGAAGGTAAGGACTTTAAAGGTAAG 
ATTGCATTAATTGAGCGTGGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTACA 
AATGCAGGTGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTTCTA 
AT T CCT T AC C GT G AAT t AC CT GT GG GGAT TAT T AGT AAAGT AG AT G G C G AGC GT AT AAAA 
AATACTTCAAGTCAGTTAACATTTAACCAGAGTTTTGAAGTAGTTGATAGCCAAGGTGGT 
AATCGTATGCTGGAACAATCAAGTTGGGGCGTGACAGCTGAAGGAGCAATCAAGCCTGAT 
GT AAC AG CT TCTGGCTTT G AAAT T T AT T CT T C AACC T AT AAT AAT C AAT AC C AAAC AAT G 
TCTGGTACAAGTATGGCTTCACCACATGTTGCAGGATTAATGACAATGCTTCAAAGTCAT 
TTGGCTGAGAAATAT AAAGGGATGAAT TT AGATT CTAAAAAATTGCTAGAAT TGT CT AAA 
AAC AT CCT CAT G AGC T C AG C AAC AG CAT T AT AT AGT G AAG AGGAT AAGG C GT T T TAT T C A 
C C AC GT C AGC AAG G T G C AGGT G T AGT T GAT G CT G AAAAAG C T AT C C AAG C T C AAT AT TAT 
AT T AC T GG AAAC GAT GGC AAAG C T AAAAT T AAT C T CAAAC G AAT GGGAG AT AAAT T T GAT 
AT C AC AGT T AC AAT T CAT AAAC T T GT AG AAG GT GT C AAAG AAT T G TAT TAT C AAG C T AAT 
G T AGC AAC AGAAC AAG T AAAT AAAGG T AAAT T T G C C C T T AAAC C AC AAG C C T T G C T AGAT 
ACT AAT T GG C AG AAAGT AAT T CT T CG TG AT AAAG AAAC AC AAGT T C GAT T TACT AT T GAT 
G C T AG T C AAT T T AGT C AG AAAT T AAAAG AAC AG AT GG C AAAT G G T T AT T T C T T AG AAGGT 
T T T GT ACG T T T T AAAG AAG C C AAGG AT AGT AAT C AG GAGT T AAT G AGT AT T C CT T T T GT A 
GGAT T T AAT G GT GAT T T T G C G AAC T T AC AAG C ACT T G AAAC AC C GAT T TAT AAG AC G CT T 
T CT AAAG GT AGT T T C T AC TAT AAAC C AAAT G AT AC AAC T C AT AAAGAC C AAT T G GAGT AC 
AATGAATCAGCTCCTTTTGAAAGCAACAACTATACTGCCTTGTTAACACAATCAGCGTCT 
TGGGGCTATGTTGATTATGTCAAAAATGGTGGGGAGTTAGAATTAGCACCGGAGAGTCCA 
AAAAG AAT TAT T T T AGG AAC T T T T G AGAAT AAGGTT G AG G AT AAAAC AAT T CAT C T T T T G 
G AAAG AGAT G C AG CG AAT AAT C CAT AT T T T G C CAT T T C T C C AAAT AAAGAT G G AAAT AG G 
GACGAAATCACTCCCCAGGCAACTTTCTTAAGAAATGTTAAGGATATTTCTGCTCAAGTT 
CTAGATCAAAATGGAAATGTTATTTGGCAAAGTAAGGTTTTACCATCTTATCGTAAAAAT 
TTCCATAATAATCCAAAGCAAAGTGATGGTCATTATCGTATGGATGCTCTTCAGTGGAGT 
G G T T TAG AT AAG GAT G G C AAAGT T GT AG C AG AT GGT T T T TAT AC T TAT C G C T T AC GT T AC 
ACACCAGTAGCAGAAGGAGCAAATAGTCAGGAGTCAGACTTTAAAGTACAAGTAAGTACT 
AAG T C AC C AAAT C T T C C T T C ACG AG C T C AGT T T GAT G AAAC T AAT C G AAC AT T AAG C T T A 
GCCATGCCTAAGGAAAGTAGTTATGTTCCTACATATCGTTTACAATTAGTTTTATCTCAT 
GT T G T AAAAGAT G AAG AAT AT GG GG AT GAG AC T T CT TAG CAT TAT T T C CAT AT AGAT C AA 
GAAGGTAAAGTGACACTTCCTAAAACGGTTAAGATAGGAGAGAGTGAGGTTGCGGTAGAC 
CCTAAGGCCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCAACGGTAAAATTG 
T C T GAT C T CT T G AAT AAG G C AG T AGT AT C AG AG AAAG AAAAC G CT AT AGT AAT T T C T AAC 
AGT T T C AAAT AT T T T GAT AAC T T G AAAAAAG AAC C T AT GT T TAT T T C T AAAAAAGAAAAA 
GT AGT AAAC AAGAAT CT AG AAGAAAT AAT AT T AGT T AAG C C G C AAAC T AC AGT TACT AC T 
C AAT CAT TGT C T AAAGAAAT AACT AAAT C AGG AAAT GAG AAAGT CCT C AC T T C T AC AAAC 
AAT AAT AG TAG C AG AGT AG C T AAG AT CAT AT C AC C T AAAC AT AAC G GGGAT T CT GT T AAC 
CAT AC C T T AC C T AGT ACAT C AG AT AGAGC AAC G AAT GGT C T AT TTGTTGGTACTTTGG C A 
T T G T TAT CT AGT T T AC T T C T T TAT T T G AAAC C C AAAAAGAC T AAAAAT AAT AGT AAA 

SEQ ID NO. 4402 
STRAIN 090 

G AGG AG C AAG AAT T AAAAAAC C AAG AG C AAT C AC C T GT AAT T G CT 
AATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATATTGTTGAAAA 
AAC AT CTG T AAC AG CTGCTTCTG C T AGT AAT AC AG TG AAAG AAAT G G GT G 
AT AC AT CT GT AAAAAAT G AC AAAAC AG AAG AT G AAT TAT T AGAAG AGT T A 
T C T AAAAAC C T T GAT ACG T CT AAT TTGGGGGCT GAT C T T G AAG AAGAAT A 
T C C CT CT AAAC C AG AG AC AAC C AAC AAT AAAG AAAG C AAT GT AGT AAC AA 
ATGCTTCAACTGCAATAGCACAGAAAGTTCCCTCAGCGTATGAAGAGGTG 
AAGCCAGAAAGCAAGTCATCGCTTGCTGTTTTTGATACATCTAAAATAAC 
AAAAT T G C AAG C CAT AAC C C AAAGAG G AAAGG G AAAT GT AG T AG CT AT T A 
T T GAT AC T G G C T T T GAT AT T AAC CAT GAT AT TTTTCGTT TAG AT AG C C C A 
AAAGAT GAT AAG C AC AG C T T T AAAACT AAAGC AG AAT T C G AGGAAT T AAA 
AGC AAAAC AT AAT AT C AC T T ATG GG AAAT G G G T T AAC GAT AAG AT T GT T T 
TT (3C ACAT AACTACGC C AACAAT AC AGAAACGGT GGCT GAT ATTGC AGC A 
GCT AT GAAAGATGGTT ATG GGT C AGAAG C AAAGAAT AT TT C GC ATGGT AC 
AC AC GT T G C T GGT AT T T T T G T AG GT AAT AG T AAACG T C C AG C AAT C AAT G 
GTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAATGCGT 
AT T C C AG AT AAAAT T GAT T C GG AC AAAT T T GG AG AAG CAT AT G C T AAAG C 
AATCACAGACGCTGtTAATCTAGGAGCAAAAaCGATTAATATGAGCCTTG 
G AAAAAC AG C AG AT T C T T T AAt t G C a C T C AAT GAT AAAG T T AAAT TAG C A 
CTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTTGTGGCTGCCGGAAA 
T G AAGGT G CAT T T G G T AT G GAT TAT AG CAAAC CAT TAT C AAC T AAT c CT G 
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ACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATACTtTGAGTGTT 
GCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCGTTGAAACAACTAT 
TGaaGGTAAGTTAGTTAAGTTGCCGATTGTGACTTCTAAACCTTTtGACA 
AAGGT AAG G C C T ACG AT GTG G T T T AT GC C AAT T AT GG T G C Aa AAAAAG AC 
TTTGAAGGTAAgGACTTTAAAGGTAAGATTGCATTAATtGAGCGTGGtGG 
T G G ACTT G AT T T TAT GAG T AAa a t C ACT c AT G C T AC AAAT G C AgGT GT T G 
tTGGTaTCGTtATTtttAACgAtCAAGAaaAACGtGGAAATTTTcTAATT 
CCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAGTAGATGGCGAGCG 
T AT AAAAAATACTT CAAGT CAGTTAACATTT AAC CAGAGTTTT gAAGT AG 
TTGATAGCCAAGGTGGCAATCGTATGCTGGAACAATCAAGTTGGGGCGTG 
AC AG CT G AAG G AG C AAT C AAG CC T GAT G T AAC AG CT T CTGGCTTT G AAAT 
TTATT CTT CAACCT AT AAT AAT CAAT ACCAAACAATGT CTGGTACAAGTA 
T GGCT T C AC C AC AT GT T GC AG GAT T AAT G AC AAT G C T T C AAAGT CAT T T G 
GCT GAGAAAT AT AAAGGGAT GAATTTAgAT T CT AAAAAAT TGCTAGAAT T 
GTCTAaAAACATCCTCATGAGCTCAGCaaCAGCATTATATAGTgAAGAgG 
ATAAGGCGTtTtATTCaCCACGTCAGCAAGGtGCAGGtGTAGTTGATGCT 
GAAAAAGCT AT CCAAGCT C AAT AT T AT GT TACT GGAAACGATGGCAAAGC 
TAAAAT T AATCTCAAACGAGT GGGAGAT AAATTT GAT AT C AC AGTT ACAA 
TTCATAAACTTGTAGAAGGTGTCAAAGAATTGTATTATCAAGCTAATGTA 
GCAACAGAACaAGTAAATAAAGGTAAATTTGCCCTTAAACCACAAGCCtT 
G CT AG AT ACT AAT T GG C AGAa AGT AAT T C T T c GT GAT AAAGAAAC AC AAG 
TT cGAT T T AC TAT T GAT GCT AGT CAAT T T AG T C AGAAAT T AAAAG AAC AG 
ATGGCAAATGGTTATTTCTTAgAAGGTTTTGTACGTTTTAAAGAAGCCAA 
G GAT AG t AAT C AG G AGT T Aa T GAG TAT T C CT T t T G T AGG AT 1 1 AAT GGT G 
ATTTTGCGAACTTACAAGCACTTGAAACACCGATTTATAAGACGCTTTCT 
AAAG GT AGT T T C T ACT AT AAAC CAAAT GAT AC AAC T C AT AAAG AC C AAT T 
GG AGT AC AAT G AAT C AG CTCCTTTT G AAAG C AAC AAC TAT ACT G C C T T GT 
TAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAAAAATGGTGGG 
G AGT T AGAAT T AG C AC C G G AgAG T c C AAAAAG AAT TAT T T T Ag G AACT T T 
TGAGAAT AAGGT TGAGGATAAAACAAT TCAT CTTTT G G AAAG AG AT GC AG 
C g AAT AAT C CAT AT T T T G C CAT T T C T C CAAAT AAAG AT GGAAAT AGG GAT 
GAAATCACTCCCCAGGCAACTTT CTT AAG AAAT GTTAAGGATATTTCTGC 
TCAAGTTCTAGATCAAAATGGAAATGTTATTTGGCAAAGTAAGGTTTTAC 
CAT C T TAT C GT AAAAAT T T C CAT AAT AAT CC AAAG C AAAG T G AT GG T CAT 
TATCGTATGGATGCCTTTCAGTGGAGTGGTTTAGATAAGGATGGCAAAGT 
TGTAGCAGATGGTTTTTATACTTATCGCCTACGTTACACACCAGTAGCAG 
AAG G AGC AAAT AG T C AGGAGT C AGACT T T AAAG T T C AAGT AAGT ACT AAG 
T C AC CAAAT C T T C C T T TACT AG C T C AG T T T GAT GAAAC T AAT C G AAC AT T 
AAG CT TAG C C AT G C C T AAGGAAAGT AG T T AT GT T C CT AC AT AT C GT T T AC 
AAT T AGT TTTATCT CAT GTTGT AAAAGAT GAAGAAT ATGGGGATGAGACT 
TCTTACCATTATTTCCATATAGATCAAGAAGGTAAAGTGACACTTCCTAA 
AACGGTTAAGATAGGAGAGAGTGAGGTTGCAGTAGACCCTAAGGCCTTGA 
CACTTGTTGTGGAAGATAAAGCTGGTAATTTTGCAACGGTAAAATTGTCT 
GACCTCTTGAATAAGGCAGTAGTATCAGAGAAAGAAAACGCTATAGTAAT 
TT CT AAC AGTTT CAAATAT TTTGAT AACTTGAAAAAAGAATCT AT GTTT A 
TTTCTAAAGAAGGAAAAGTAGTAAACAAGAATCTAGAAGAAATAACATTA 
GTTAAGCCGCAAACTACAGTT ACT ACT CAAT CAT T GT CTAAAGAAAT AAC 
T AAAT C AG G AAAT GAG AAAGT C C T C AC T T C T AC AAAC AAT AAT AGT AG C A 
G AGT AG CT AAG AT CAT AT C AC C T AAAC AT AAC G G GG AT T CT GT T AAC CAT 
ACC 

SEQ ID NO. 4403 
STRAIN A909 

GAGG AG C AAGAAT T AAAAAACC AAG AGC AAT 

CACCTGTAATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACT 
AATACTGTTGAAAAAACATCTGTAACATCTGCTTCTGCTAGTAATACAGC 
G AAAG AAAT G GGT GAT AC AT C T GT AAAAAAT G AC AAAAC AG AAG AT G AAT 
TATTAGAAGAGTTATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGAT 
CTT GAAG AAG AAT AT C C C T CT AAAC C AG AG AC AAC C AAC AAT AAAG AAAG 
CAAT GT AGT AAC AAATGCT TC AACT GCAAT AGC ACAGAAAGTT C CCT C AG 
CAT AT G AAGAGGT GAAG C C AG AAAG C AAG T CAT C AC TTGCTGTTCTT GAT 
AC AT CT AAAAT AACAAAAT TGC AAGC CAT AAC C C AAAG AGG AAAG GG AAA 
T G TAG TAG C TAT TAT T G AT ACT GG C T T T GAT AT T AAC CAT GAT AT T T T T C 
GTTT AG AT AG CCC AAAAG AT gaT AAGC AC AGCTTTAa AACT AAGGC AGAA 
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TTTGAGGAATTAAAAGCAAAACATAATATCACTTATGGGAAATGGGTTAA 
C G AT AAG AT TGtTTTTG C AC AT AACT ACGCC Aa C AAT AC AG AAAC G GT G G 
CTGATATTGCAGCAGCTATGAAAGATGGTTATGGGTCAGAAGCAAAGAAT 
ATTTCGCATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACG 
TCCAGCAATCAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAG 
TCTTATTAATGCGTATTCCAGATAAAATTGATTCGGACAAATTTGGTGAA 
GC AT AT GC T AAAG C AAT C AC AGAC G C T GT T AAT CT AG G AGC AAAAAC GAT 
T AAT AT G AGC C T T GGAAAAAC AG C AGAT T CTT T AAT T GC T CT C AAT GAT A 
AAGTTAAATTAGCACTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTT 
GT GGC T GC C GGAAAT G AAG G T G CAT T T GGT AT G GAT TAT AG C AAAC CAT T 
AT C AACT AAT C C T G ACT AC G GT AC G G T T AAT AG T C C AG C TAT T T C T GAAG 
ATACTTTGAGTGTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTC 
GTTGAAACAACTATTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACTTC 
TAAACCTTtTGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATG 
G T G C AAAAAAAAGACT T T G AAGGT AAG G AC T T T AAAGG T AAG AT T G C AT T 
AATTGAGCGTGGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTA 
C AAAT G C AGG T GTTGTTGG T AT CGT T AT T T T T AACG AT C AAG AAAAAC GT 
GGAAATTTTCTAATTCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAA 
AGT AG AT GG C G AGCGT AT AAAAAAT AC T T C AAGT C AG T T AAC AT T T AAC C 
AGAGTTTTGAAGTAGTTGATAGCCAAGGTGGCAATCGTATGCTGGAACAA 
T C AAGT T GGGG CGT G AC AG CT G AAG GAG C AAT C AAGC C TGAT GT AAC AG C 
TTCTGGCTTTGAAATTTATTCTTCAACCTATAATAATCAATACCAAACAA 
T GT C T G GT AC AAGT AT GGC T T C AC C AC AT G t T GC AGG AT T AAT G AC AAT G 
CTT C AAAGT C AT T T G G CT GAG a AAT AT AAAGG GAT G AAT T TAG AT T C T AA 
AAAAT T G CT AG a AT T G T CT AAAAAC AT c CT CAT GAG C T C AGC AAC AG CAT 
TAT AT AGT GAAG AGGAT AAG G C GT T T T AT T C AC C AC GT C AG C AAG GT GC A 
GGTGTAGTTGATGCTGAAAAAGCTATCCAAGCTCAATATTATGTTACTGG 
AAAC GAT GG C AAAG C T AAAAT T AAT C T C AAAC G AGT G G GAG AT AAAT T T G 
AT AT C AC AG T T AC AAT T CAT AAAC T T G TAG AAG GT G T C AAAG AAT T G TAT 
TAT CAAGCTAATGT AGC AAC AG AAC AAGT AAAT AAAGGTAAATT TG C CCT 
TaAACCaCAAGCCTTGCTAGATACTAATTGGCAGAAAGTAATTCTTcGTG 
ATAAAGAAACACAAGTTCGATTTACTAtTGATTCTAGTCAATTTAGTCAG 
AAATTAAAAGAACAGATGGCAAATGGTTATTTCTTAGAAGGTTTTGTACG 
T T T T AAAG AAG C C AAGGAT AG T AAT C AGGAGT T AAT G AGT AT TCCTTTTG 
T AGGAT T T AAT GGT GAT TT T G C GAACT T AC AAG C AC T T GAAAC AC C GAT T 
T AT AAG ACG CT T T C T AAAG GT AG T T T C T AC TAT AAAC C AAAT GAT AC AAC 
T C AT AAAGAC C AAT T GG AGT AC AAT G AAT C AG CTCCTTTT G AAAG C AAC A 
AC TAT ACT G C C T T G T T AAC AC AAT C AG C GT CTTGGGGC T AT GT T GAT TAT 
GT C AAAAAT G GT G GG G AGT TAG AAT TAG C AC C GGAGAG T C C AAAAAG AAT 
TAT T T T AGG AACT T T T GAG AAT AAG G T T GAG GAT AAAAC AAT T CAT C T T T 
T GGAAAG AGAT G C AG CG AAT AAT C CAT AT T T T GC C AT T T C T C C AAAT AAA 
GAT GGAAAT AG G G AT G AAAT C ACT C C C C AGG C AAC T T T C T T AAG AAAT G T 
T AAGGAT ATTTCTGCTCAAGTTCT AGAT C AAAAT GGAAAT GTTATTTGGC 
AAAGT AAGGT TTTACC AT CTT ATCGTAAAAATTTC CAT AAT AAT CC AAAG 
C AAAGT GAT GGT CAT TAT C GT AT G GAT G C C CT T C AGT GG AG T GGT T T AGA 
TAAGGATGGCAAAGTTGTAGCAGATGGTTTTTATACTTATCGTTTACGTT 
AC AC AC C AGT AG C AG AAGGAG C AAAT AG T C AGG AGT C AG ACT T T AAAGT T 
C AAG T AAGT ACT AAGT C AC C AAAT CT T C C T T C AC GAG C T C AG T T T GAT G A 
AAC T AAT C G AAC AT T AAGC T TAG C C AT G C C T AAG G AAAGT AG T T AT G T T C 
C T AC AT AT CGT CT AC AAT T AGT T T TAT CT CAT GT T GT AAAAG AT G AAGAA 
TAT G GAG AT GAGACT T C T T AC CAT TAT T T C CAT AT AG AT C GAG AAG G T AA 
AGTGACACTTCCTAAAACAGTTAAGATAGGAGAGAGTGAGGTTGCAGTAG 
ACCCTAAGACCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCA 
ACGGTAAAATTGTCTGACCTCTTGAATAAGGCAGTAGTATCAGAGAAAGA 
AAACGCTATAGTAATTTCTAACAATTTCAAATATTTTGATAACTTGAAAA 
AAG AAC C T AT G T T TAT T T C T AAAG AAGGAAAAGT AGT AAAC AAG AAT C T A 
GAAGAAAT AGC AT TAGTT AAGC CGCAAACTACAGT TACT ACT C AAT CATT 
GT CT AAAG AAAT AAC T C AAT C AGG AAAT GAG AAAG T C CT C AC T T C T AC AA 
AC AAT AAT AG TAG C AG AG TAG C T AAG AT CAT AT C AC C T AAAC AT AAC GG G 
GAT T C T GT T AAC C AT AC C 

SEQ ID NO. 4404 
STRAIN H36B 

G AGG AG C AAG AAT T AAAAAAC C AAG AG C AAT C AC C T G T AAT T G C 
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T AAT GT T G CT C AAC AG C CAT C GC CAT C GGT AAC T AC T AAT AC T GT T GAAA 
AAACATCTGTAACATCTGCTTCTGCTAGTAATACAGCGAAAGAAATGGGT 
GAT AC AT CT GT AAAAAAT G AC AAAAC AG AAG AT G AAT TAT T AG AAG AG T T 
ATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGATCTTGAAGAAGAAT 
AT C C CT CT AAAC C AGAGAC AAC C AAC AAT AAAGAAAG C AAT GT AGT AAC A 
AAT G CT T C AAC T G C AAT AG C AC AGAAa GT T C CC T C AG CAT AT G AAGAGGT 
GAAGCCAG AAAGCAAGT C AT C ACTTGCT GTTCTTGATACAT CTAAAAT AA 
C AAAAT T G C AAGC C AT AAC C C AAAG AGG AAAGGG AAAT GT AGT AG C T AT T 
AT T GAT AC T G G C T T T GAT AT T AAC CAT GAT AT T T T T C G T T T AGAT AGC C C 
AAAAG AT GAT AAG C AC AG C T T T AAAAC T AAGG C AG AAT T T G AGG AAT T AA 
AAGC AAAACAT AAT AT C ACTT AT GGGAAAT GGGTTAACGAT AAGATT GTT 
T T T GC AC AT AACT ACGC C Aa C AAT AC AGAAAC GGT GG CT G AT AT T G C AG C 
AG C TAT G AAAG AT G GT T AT GGGT C AG AAG C AAAG AAT AT T T C GC AT GGT A 
C AC ACGT T G CT GGT AT T T T T GT AGG T AAT AGT AAAC G T C C AGC AAT C AAT 
GGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAATGCG 
TAT T C C AG AT AAAAT T GAT T CGG AC AAAT T T G GT GAAG CAT AT G C T AAAG 
CAATCACAGACGCTGTTAATCTAGGAGCAAAAACGATTAATATGAGCCTT 
GG AAAAAC AG CAG AT T C T T T AAT T G C T CT C AAT G AT AAAGT T AAAT T AGC 
ACTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTTGTGGCTGCCGGAA 
ATGAAGGTGCATTTGGTATGGATTATAGCAAACCATTATCAACTAATCCT 
G AC T AC GGT AC GGT T AAT AGT C CAG CT AT T T C T GAAG AT ACT T T GAG T GT 
T G C T AG CT AT GAAT C ACT T AAAACT AT C AGT G AGGT C GT T GAAAC AACT A 
TTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACTTCTAAACCTTtTGAC 
AAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGGTGCAAAAAAAGA 
C T T T GAAGGT AAGG ACT T T AAAG GT AAGAT T G CAT T AAT T GAG C GT GGT G 
GT GG ACT T GAT T T TAT G AC T AAAAT C AC T CAT GC T AC AAAT G C AGG T GT T 
GTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTTCTAAT 
TCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAGTAGATGGCGAGC 
G TAT AAAAAAT AC T T C AAG T CAG T T AAC AT T T AAC CAG AG T T T T G AAGT A 
GTTGATAGCCAAGGTGGCAATCGTATGCTGGAACAATCAAGTTGGGGCGT 
GAC AGC T GAAG G AGC AAT C AAG C CT G AT GT AAC AG CTTCTGGCTTT G AAA 
T T T AT T C T T C AAC CT AT AAT AAT C AAT AC C AAAC AAT GT C T G GT AC AAG T 
AT G G C T T C AC C AC AT GT T G C AGG AT T AAT GAC AAT G C T T C AAAGT CAT T T 
GG C T GAGAAAT AT AAAG GG AT GAAT T T AGAT T C T AAAAAAT T G CT AG AAT 
T GT C T AAAAAC AT C C T CAT G AG CT CAG C AAC AG CAT TAT AT AG T GAAG AG 
GATAAGGCGTTTTATTCACCACGTCAGCAAGGTGCAGGTGTAGTTGATGC 
T G AAAAAG C TAT C C AAG C T C AAT AT TAT GT T AC T GG AAAC GAT G G C AAAG 
C T AAAAT T AAT C T C AAAC GAGT G GGAG AT AAAT TT G AT AT C AC AG T T AC A 
ATTCATAAACTTGTAGAAGGTGTCAAAGAATTGTATTATCAAGCTAATGT 
AGCAACAGAACAAGTAAATAAAGGTAAATTTGCCCTTAAACCaCAAGCCT 
TGCTAGATACTAATTGGCAGAAAGTAATTCTTCGTGATAAAGAAACACAA 
GTT CG AT T TACT AT T GAT T C T AGT C AAT T TAG T CAG AAAT T AAAAG AAC A 
G AT GGC AAAT GG T TAT T T C T T AGAAG GT T T T G t AC GT T T T AAAG AAGC C A 
AGGAT AGT AAT C AGG AGT T AAT GAGT AT T C CT T T T GT AG GAT T T AAT GGT 
GATTTTGCGAACTtACAAGCACTTGAAACACCGATTTATAAGACGCTTTC 
T AAAGGT AGT T T CT ACT AT AAACC AAAT GAT AC AACT CAT AAAG AC C AAT 
TGGAGTACAATGAATCAGCTCCTTTTGAAAGCAACAACTATACTGCCTTG 
T T AAC AC AAT CAG C GT CT T GGG GC T AT GT T GAT TAT GT C AAAAAT G GT G G 
G GAG T T Ag AAT T Ag C AC C G GAG AGT C C AAAAAG AAT TAT T T T AGGAAC T T 
T T G AG AAT AAGGT T GAG GAT AAAAC AAT T CAT CT T T T G GAAAG AG AT G C A 
G C GAAT AAT C CAT AT T T T G C CAT T T CT C C AAAT AAAG AT G G AAAT AG G G A 
T G AAAT C AC T C C C C AGG C AACT T T C T T AAG AAAT GT T AAG GAT AT T T C T G 
CT C AAGT T CT AGAT C AAAAT GGAAATGTT AT TT GGC AAAGT AAGGT TTT A 
C CAT CTT AT CGT AAAAAT T T CC AT AAT AAT CC AAAG CAAAGTGATGGT CA 
T TAT C GT AT GG AT G C C C T T CAG T G G AGT G G T T TAG AT AAG GAT GGC AAAG 
TTGTAGCAGATGGTTTTTATACTTATCGTTTACGTTACACACCAGTAGCA 
GAAG GAG C AAAT AG T CAG GAG T CAG AC T T T AAAGT T C AAGT AAG T AC T AA 
GT C AC C AAAT C T T C C T T C AC G AG CT C AGT T T GAT G AAAC T AAT C G AAC AT 
TAAGCTTAGCCATGCCTAAGGAAAGTAGTTATGTTCCTACATATCGTCTA 
C AAT TAGT T TT AT CT C AT GT T GT AAAAGAT GAAG AAT AT GGAG AT GAG AC 
T T CT T AC CAT TAT T T C CAT AT AG AT C AAG AAG GT AAAG T GAC AC T T C C T A 
AAAC AG T T AAGAT AG GAG AG AGT G AGG T T G C AG T AG AC CC T AAG AC CT T G 
ACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCAACGGTAAAATTGTC 
TGACCTCTTGAATAAGGCAGTAGTATCAGAGAAAGAAAACGCTATAGTAA 
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TTTCTAACAATTTCAAATATTTTGATAACTTGAAAAAAGAACCTATGTTT 
ATT TCTAAAGAAGGAAAAGTAGT AAACAAGAAT CTAGAAGAAAT AGCAT T 
AGT T AAG C CG C AAAC T AC AGT TAG TACT C AAT CAT T GT C T AAAG AAAT AA 
CTCAATCAGGAAATGAGAAAGTCCTCACTTCTACAAACAATAATAGTAGC 
AG AGT AG C T AAG AT CAT AT C AC C T AAAC AT AACG G G GAT T C T GT T AAC C A 
TACC 

SEQ ID NO. 4405 
STRAIN 18RS21 

GAGGAGC AAGAATTAAAAAAC C AAG AG C AAT C AC C 

T GT AAT T G C T AAT GT T G C T C AAC AG C CAT C G C CAT C G GT AAC T AC T AAT A 
CT GT T G AAAAAAC AT C T G T AAC AG CT GCTTCTGC TAG T AAT AC AG CG AAA 
G AAAT GG GT GAT AC AT C T G T AAAAAAT GAC AAAAC AG AAGAT G AAT TAT T 
AG AAG AGT TAT CT AAAAAC C T T GAT AC G T C T AAT TTGGGGGCT GAT C T T G 
AAGAAGAAT AT C CCT CT AAAC CAGAGACAACC AAC AAT AAAG AAAGC AAT 
GT AGT AAC AAAT G CT T C AAC T G C AAT AG C AC AGAAAGT T C C C T C AGCAT A 
T G AAG AGGT G AAG C C AGAAAG C AAGT CAT CGCTTGCTGTTCTT GAT AC AT 
CTAAAATAACAAAATTACAAGCCATAACCCAAAGAGGAAAGGGAAATGTA 
GTAGCTATTATTGATACTGGCTTTGATATTAACCATGATATTTTTCGTTT 
AG AT AG C C C AAAAGAT GAT AAG C AC AG C T T T AAAAC T AAG AC AG AAT T T G 
AGGAATTAAAAGCAAAACATAATATCACTTATGGGAAATGGGTTAACGAT 
AAGAT T GT T T T T G C AC AT AAC T AC G C C AAC AAT AC AGAAAC G GT GG C T GA 
TAT T G C AG C AG C T AT G AAAG AT GGT T AT GGT T C AGAAG C AAAG AAT AT T T 
CGCATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGTCCA 
GCAATCAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTT 
AT T AAT G C GT AT T C C AG AT AAAAT T GAT T CG GAC AAAT T T G G T G AAG CAT 
ATGCTAAAGCAATCACAGACGCTGTTAATCTAGGAGCAAAAACGATTAAT 
AT G AGT AT T GG AAAAAC AGC T GAT T C TT T AAT TG C T C T C AAT GAT AAAGT 
T AAAT T AG C AC T T AAAT T AG CT T CT G AGAAGGGC G T T G C AG TT GT T GT G G 
CTGCCGGAAATGAAGGCGCATTTGGTATGGATTATAGCAAACCATTATCA 
ACTAATCcTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATAC 
T T T GAG T GT T GC T AG CT AT G AAT C AC T T AAAAC TAT C AG T G AG GT C GT T G 
AAACAACTATTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACTTCTAAA 
CCTTTTGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGGTGC 
AAAAAAAG AC T T T GAAGG T AAGG AC T T T AAAG GT AAG AT T G C AT T AAT T G 
AG C GT G GT GG T GG AC T T GAT T T T AT G AC T AAAAT C AC T CAT GCT AC AAAT 
GCAGGTGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAA 
TTTTCTAATTCCTTACCGTGAATTACCTGTGGGGATTATTAgTAAAGTAG 
AT G G C G AGC G TAT AAAAAAT ACT T C AAGT C AGT T AAC AT T t AAC C Ag AG T 
TTTGAAGtAGTTGATAGCCAAGGTGGtAATCGTaTGCTGGAACAATCAAG 
TTGGGGCGTGACAGCTGAAGGAGCAATCAAGCCTGATGTAACAGCTTCTG 
G C T T T GAAAT T TAT T C T T C AAC C TAT AAT AAT C AAT AC C AAa C AAT GT CT 
G GT AC AAGT AT G G C T T C AC C AC AT GT T G C AGGAT T AAT GAC AAT G CT T C A 
AAGT CAT T T G G C T GAG AAAT AT AAAGGGAT G AAT T TAG AT T C T AAAAAAT 
T GC T AGAAT T GT C T AAAAAC AT CCT CAT GAG C T C AG C AAC AG CAT TAT AT 
AGT GAAGAG G AT AAG GCG T T T T AT T C AC C AC GT C AG C AAG GT G C AGGT GT 
AGT T GAT GCT G AAAAAG CT AT C C AAG C T C a AT AT TAT AT T AC T GG AAAC G 
AT GGCAa AGCT AAAATT AAT CT CAAACG AAT GGGAGAT AAAT T TGAT AT C 
AC AGT T AC AAT T CAT a AACT T GT AGAAGG T G T C AAAG AAT T GT AT T AT C A 
AGCT AATGTAGCAAC AG AACAAGTAAATAAAGGT AAATTT GCC CTT a AAC 
C AC AAG C C T T G CT AGAT AC T AAT T GG C AG AAAGT AAT T C T T c G T GAT AAA 
G AAAC AC AAGT T C GAT T T AC T AT T GAT G C T AGT C AAT T TAG T C AG AAAT T 
AAAAG AAC AG AT G G C AAAT G G T T AT T T C T T Ag AAG GT T T T GT AC GT T T T A 
AAG AAG C C AAGG AT AGT AAT C AGGAG T T AAT GAG TAT T C C T T T T GT AG G A 
TTTAATGGTGATTTTGCGAACTTACAAGCACTTGAAACACCGATTTATAA 
GACGATTTCTAAAGGTAGTTTCTACTATAAACCAAATGATACAACTCATA 
AAGACCAATTGGAGTACAATGAATCAGCTCCTTTTGAAAGCAACAACTAT 
ACTGCCTTGTTAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAA 
AAAT GGT GGG G AGT TAG AAT TAG C a C C G GAG AGT C C AAAAAG AAT TAT T T 
TAG G AAC T T T T G AGAAT AAG GT T GAG GAT AAAAC AAT T CAT CT T T T GG AA 
AG AG AT G C AG CG AAT AAT C CAT AT T T T G C CAT T T C T C C AAAT AAAG AT GG 
AAAT AGGG AC GAAAT C AC T C C C C AG G C AAC t T T CT T AAG AAAT G T T AAGG 
AT AT T T CT G C T C AAG T T C TAG AT C AAAAT GG AAAT G T TAT T T G G C AAAGT 
AAGGT T TT AC CAT C T T AT CGT AAAAAT T T C CAT AAT AAT C C AAAGC AAAG 
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TGATGGTCATTATCGTATGGATGCTCTTCAGTGGAGTGGTTTAGATAAGG 
ATGGCAAAGTTGTAGCAGATGGTTTTTATACTTATCGCTTACGTTACACA 
C C AGT AG C AG AAGG AG C AAAT AGT C AGGAGT C AG AC T T T AAAGT ACAAGT 
AAGT ACT AAGT C AC C AAAT C T T C C T T C AC GAGC T C AGT T T GAT G AAAC T A 
ATCGAACATTAAGCTTAGCCATGCCTAAGGAAAGTAGTTATGTTCCTACA 
TATCGTTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGG 
GG AT GAG AC T T CT T AC CAT TAT T T C C AT AT AGAT C AAGAAG GT AAAGT GA 
CACTTCCTAAAACGGTTAAGATAGGAGAGAGTGAGGTTGCGGTAGACCCT 
AAGGCCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGcAACGGT 
AAAAT T GT C T GAT C T C T T G AAT AAG G C AGT AG TAT C AGAG AAAGAAAAC G 
C T AT AGT AAT T T C T AAC AG T T T C AAAT AT T T T GAT AAC T T G AAAAAAG AA 
C C T AT GT T T AT T T C T AAAAAAGAAAAAGT AG T AAAC AAGAAT C T AGAAGA 
AAT AAT AT T AGT T AAG C C G C AAACT AC AGT T AC T ACT CAAT CAT T GT C T A 
AAGAAAT AAC T AAAT C AG G AAAT GAG AAAGT C C T C AC T T CT AC AAAC AAT 
AATAGTAGCAGAGTAGCTAAGATCATATCACCTAAACATAACGGGGATTC 
TGTTAACCATACC 

SEQ ID NO. 4406 
STRAIN M732 

GAGGAGCAAGAATT AAAAAAC CAAGAG CAAT CAC CT 

GTAATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATAT 
T GT T G AAAAAAC AT C T GT AAC AG CTGCTTCTG C T AG T AAT AC AGT G AAAG 
AAAT G G GT G AT AC AT CT G T AAAAAAT GAC AAAAC AG AAGAT G AAT TAT T A 
GAAGAGTTATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGATCTTGA 
AG AAG AAT AT C CCT CT AAACCAGAGACAACCAAC AATAAAGAAAGCAAT G 
T AG T AAC AAAT G C T T C AAC T GC AAT AG CAC AG AAAGT T C C C T C AG CAT AT 
GAAGAGGTGAAGTCAGAAAGCAAGTCATCGCTTGCTGTTCTTGATACATC 
T AAAAT AAC AAAAT T AC AAG C CAC AAC C C AAAG AG GAAAGGG AAAT G TAG 
T AGC T AT T AT T GAT ACT G G C T T T GAT AT T AAC CAT GAT AT TTTTCGTTTA 
GAT AG C C C AAAAG AT GAT AAG CAC AG C T T T AAAAC T AAGG C AGAAT T T G A 
G G AAT T AAAAG C AAAAC AT AAT AT C ACT TAT G GG AAAT G G GT T AAC G AT A 
AGAT T GT T T T T G C AC AT AAC T ACG C C AAC AAT AC AG AAAC GG T GG CT GAT 
ATTGCAGCAGCTATGAAAGATGGTTATGGGTCAGAAGCAAAGAATATTTT 
GCATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGTCCAG 
CAATCAATAGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTA 
T T AAT G CGT AT T C C AG AT AAAAT T GAT T C GG AC AAAT T T G GAGAAG CAT A 
T G C T AAAG CAAT CAT AG AC G C T G T T AAT C T AG G AG C AAAAAC GAT T AAT A 
TGAGCCTGGGAAAAACGGCTGATTCTTTAATTGCTCTCAATGATAAAGTT 
AAAT TAG CAC T T AAAT T AG CT T C T GAGAAG G GC GT T G C AGT TGTTGTGGC 
T G C C GGAAAT G AAGGT G CAT T T GGT AT G GAT TAT AG C AAAC CAT TAT C AA 
CTAATCCTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATACT 
TTGAGTGTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCGTTGA 
AAC AAC TAT T G AAGGT AAGT TAG T T AAGT T G C C GAT T G T GAC T T CT AAAC 
CT T t T GAC AAAG G T AAG GC C T AC GAT GT GGT T T AT G C CAAT TAT GGT G C A 
AAAAAGAT T T T G AAGG T AAG GAC T T T AAAG GT AAG AT T G CAT T AAT T GAG 
CGTGGTGGTG G ACT T GAT T T TAT G AC T AAAAT C ACT CAT G C T AC AAAT G C 
AG G T GT T GT T GGT AT CGT TAT T T T T AACG AT C AAGAAAAACGT G G AAAT T 
TTCTAATTCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAGTAGAT 
G G C GAG C GT AT AAAAAAT AC T T C AAGT C AGT T AAC AT T T AAC C AG AGT T T 
T G AAG T AGT T GAT AG C C AAG G T GGC AAT C GT AT GC T G GAAC AAT C AAGT T 
GGGGCGTGACAGCTGAAGGAGCAATCAAGCCTGATGTAACAGCTTCTGGC 
T T T G AAAT T T AT T CT T C AAC C TAT AAT AAT CAAT AC T AAAC AAT GT C T G G 
T AC AAGT AT G G C T T CAC CAC AT GT T G C AG GAT T AAT GAC AAT G C T T C AAA 
G T CAT T T G G C T GAG AAAT AT AAAGGG ATGAAT T TAG AT T C T AAAAAAT T G 
C TAG AAT T G T C T AAAAAC AT CCT CAT GAG C T C AG C AAC AG CAT TAT AT AG 
T G AAG AGG AT AAGG C GT T T T AT T CAC CAC GT C AG C AAGGT G C AGGT G T AG 
T T GAT G CT G AAAAAG C T AT C C AAG C T CAAT AT TAT G T T AC T GG AAACG AT 
G G C AAAGT T AAAAT T AAT C T C AAAC G AG AGGG AG AT AAAT T T GAT AT CAC 
AGTT AC AAT T CAT a AAC TTGT AG AAGGT GT C AAAG AAT T GT AT T AT C AAG 
CTAATGTAGCAACAGAaCAAGTAAATAAAGGTAAATTTGCCCTTaAACCA 
C AAG C C T T G C T AGAT AC T AAT T GG C AG AAAG T AAT T C T T CGT GAT AAAG A 
AAC AC AAGT T C GAT T TACT AT T GAT G CT AGT CAAT T T AGT C AG AAAT T AA 
AAG AAC AG AT G G C AAAT GGT TAT T T C T TAG AAGG T T T T GT ACGT T T T AAA 
G AAG C C AAG GAT AG T AAT C AG G AGT T AAT G AGT AT T C C T TT T GT AGG AT T 
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T AAT GGT GAT T T T G C G AACT T AC AAG C AC T T G AAAC a C C GAT T T AT AAGA 
C GCT T T CT AAAGGT AGT T T C T ACT AT AAAC C AAAT G AT AC AAC T CAT AAA 
GACCAATT GGAGT ACAATGAAT CAGCT CCTT TTGAAAGCAAC AACT AT AC 
T G C C T T GT T AAC AC AAT CAGCG T CT T GGGG CT AT GT T GAT TAT GT C AAAA 
AT GGT G G GG AGTT AGAAT T AG C AC C G G AGAGT C C AAAAAGAAT TAT T T T A 
G G AACT T T T GAG AAT AAG GT T GAG GAT AAAAC AAT T CAT C T T T T G G AAAG 
AG AT G CAG C GAAT AAT C CAT AT T TT G C C ATT T CT C C AAAT AAAGAT GG AA 
ATAGGGACGAAATCACTCCCCAGGCAACTTTCTTAAGAAATGTTAAGGAT 
AT T T C T GCT C AAGT T CT AG AT CAAAAT GG AAAT GT TAT T T GG C AAAGT AA 
GGT T T T AC CAT C T TAT C GT AAAAAT T T C CAT AAT AAT C C AAAG C AAAGT G 
ATGGTCATTATCGTATGGATGCTCTTCAGTGGAGTGGTTTAGATAAGGAT 
G GC AAAGT T GT AGC AG AT G GT T T T TAT AC T TAT C GCT T AC GT T AC AC AC C 
AG TAG C AGAAGG AG C a AAT AGT C AGG AGT CAG AC T T T AAAGT T C AAGT AA 
GT AC T AAGT C AC C AAAT C T T C CT T C AC GAG C T C AGT T T GAT G AAAC T AAT 
CG AAC AT T AAG C T TAG C CAT G C C T AAG GAAAG T AGT TAT GT T C C T AC AT A 
TCGTTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGGGG 
AT GAGACTT CTT ACCATT AT TT C CAT AT AGAT CAAGAAGGT AAAGTGACA 
CTTCCTAAAACGGTTAAGATAGGAGAGAGTGAGGTTGCGGTAGACCCTAA 
GG C CT T GAC AC T T GT T GT GG AAG AT AAAG C T G GT AAT T T T GC AAC GGT AA 
AATTGTCTGACCTCTTGAATAAGGCAGTAGTATCAGAGAAAGaAAACGCT 
ATAGTAATTT CTAACAGT T TC AAAT ATT TT GAT AAC TTGAAGAAAGAACC 
TAT GT T T AT T T CT AAAG AAGGAAAAGT AGT AAAC AAG AAT C T AGAAG AAA 
T AAC AT T AGT T AAG C C T C AAACT AC AGT T ACT AC T C AAT CAT T GT CT AAA 
GAAAT AACT AAAT CAGGAAATGAGAAAGTCCTCACTTCT AC AAAC AAT AA 
T AGT AGC AG AG T AG CT AAG AT CAT AT C AC C T AAAC AT AACG G GG AT T C T G 
TTAACCATACC 

SEQ ID NO. 4407 
STRAIN COH1 

GAGG AG C AAG AAT T AAAAAAC C AAG AG C AAT C AC C T GT 
AATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTaACTACTAATATTG 
T TGAAAAAAC AT CTGT AACAGCTGCTT CT G CT AGT AAT AC AGT GAAAGAA 
ATGGGt gAT AC ATCT GT AAAAAAT GACAAAACAGAAGAT GAAT TAT TAG A 
AG AGT TAT C T AAAAAC CTT G AT AC GT C T AATT TGGGGGCT GAT CTT G AAG 
AAG AAT AT C C C T CT AAAC C AG AGa CAAC C AAC AAT AAAG AAAG C AAT GT A 
GT AAC AAATGCTT C AACT GCAATAGCACAGAAAGTTCCCT CAG CAT ATGA 
AG AG GT GAAGT CAG AAAG C AAGT CAT CGCTTGCTGTTCTT GAT AC AT C T A 
AAAT AAC AAAAT T AC AAG C C AC AAC C C AAAG AGG AAAGG GAAAT GT AGT A 
G C TAT TAT T GAT ACT G G CT T T GAT AT T AAC CAT GAT AT TTTTCGTT T AGA 
TAG C C C AAAAG AT G AT AAGC AC AG CTT T AAAAC TAAGG CAG AATT T GAG G 
AA t T AAAAG C AAAAC AT AAT AT C AC T TAT GGG AAAT GGGT T AAC GAT AAG 
AT TGTTTTTG C AC AT AACT AC G C C Aa C AAT AC AG AAAC GG T GG CT GAT AT 
T GC AG CAG C TAT GAAAG AT G GT TAT GGGT CAG AAG C AAAG AAT AT T T T G C 
ATGGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGTCCAGCA 
ATCAATAGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATT 
AAT G CGT AT T C CAG AT AAAAT T GAT T C GG AC AAAT T T G GAG AAGC AT AT G 
CT AAAG C AAT C AT AGAC G C T GT T AAT C T AG G AG C AAAAAC GAT T AAT AT G 
AGCCTGGGAAAAACGGCTGATTCTTTAATTGCTCTCAATGATAAAGTTAA 
AT TAG C AC T T AAAT TAG CTT CT G AG AAG GGCGT T G C AGT T GT T GT GG C T G 
C C G GAAAT GAAGG T G CAT T T GGT AT GG AT TAT AG C AAAC CAT TAT CAAC T 
AAT C C T G AC T AC GGT ACGGT T AAT AGT C C AGC TAT T T C T G AAG AT AC T T T 
GAGT GT T G C T AG CT AT GAAT C AC T T AAAAC TAT C AGT G AG GT C GT T G AAA 
CAACTATTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACTTCTAAACCT 
TtTGACAAAGGTAAGGCCTACGATGT GGT TTATGCC AATT ATGGTGCAAA 
AAAGAT T T T GAAGG TAAGG ACT T T AAAGG T AAG AT T GC AT T AAT T GAG C G 
TGGTGGTG G AC T T GAT T T T AT G AC T AAAAT C ACT CAT GCT AC AAAT G C AG 
GTGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTT 
CTAATTCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAGTAGATGG 
CG AG CGT AT AAAAAAT ACT T C AAGT CAG T T AAC AT T T AAC CAG AG T T T T G 
AAGT AG T T GAT AG C C AAG GT G G C AAT CGT AT GCT GG AAC AAT C AAGT T G G 
GGCGT GAC AGCTGAAGGAGC AAT CAAGCCTGATGT AAC AG CTT CTGGCTT 
TGAaATTTATTCTTCAACCTATAATAATCAATACTAAACAATGTCTGGTA 
C AAGT AT G G CT T C AC C AC AT GT T G C AGG AT T AAT GAC AAT G CT T C AAAGT 
CAT T T GG C T GAG AAAT AT AAAG G GAT GAAT T TAG AT T CTAa AAAAT T G C T 
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AGaATTGTCTAaaAACATCCTCATGAGCTCAGCAACAGCATTATATAGTG 
AAGAGG AT AAGG CGT T T T AT T C AC C AC GT C AG C AAG GT G GAG G T GT AGT T 
GAT G C T G AAAAAG CT AT CC AAG C T C AAT AT TAT GT T AC T GGAAACGAT GG 
C AAAGT T AAAAT T AAT CT C AAAC G AG AG GGAGAT AAAT T T GAT AT C AC AG 
TTACAATTCATaAACTTGTAGAAGGTGTCAAAGAATTGTATTATCAAGCT 
AAT GT AG C Aa C AG AAC AAG T AAAT AAAG GT AAAT T T GC C C T T AAAC C AC A 
AG C C T T G CT AGAT AC T AAT T G G C AG AAAG T AAT T CT T c GT GAT AAAGAAA 
C AC AAGT T C GAT T TACT AT T G AT GC T AGT C AAT T T AGT C AG AAAT T AAAA 
GAAC AGAT GG C AAAT G GT T AT T T C T T AG AAG GT T T T G T ACGT T T T AAAG A 
AG C C AAGGAT AG T AAT C AGG AG T T AAT GAGT AT T C CT T T T G T AG GAT T T A 
AT G GT GAT T T T G C GAAC T T AC AAG C AC T T G AAAC AC C GAT T TAT AAG AC G 
C T T T C T AAAGG T AGT T T C TACT AT AAAC C AAAT GAT AC AAC T CAT AAAG A 
C C AAT T GG AGT AC AAT G AAT C AG CTCCTTTT G AAAG C AAC AACT AT AC T G 
CCTTGTTAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAAAAAT 
GGTGGGGAGTTAGAATTAGCACCGGAGAGTCCAAAAAGAATTATTTTAGG 
a AC T T T T GAG AAT AAG G T T GAG G AT AAAAC AAT T CAT C T T T T GG AAAG AG 
ATGCAGCGAATAATCCATATTTTGCCATTTCTCCAAATAAAGATGGAAAT 
AGGGACG AAAT C AC T C C C C AG G C a AC T T T CT T AAGAAAT G T T AAGG AT AT 
T TCTGCTC AAG tTCT AGAT CAAAATGGAAATGTT AT TTGGCAAAGT AAGG 
T T T T AC CAT C T TAT CGT AAAAAT T T C CAT AAT a AT C C AAAG C AAAG T GAT 
GGTCATTATCGTATGGATGCTCTTCAGTGGAGTGGTTTAgATAAGGATGG 
C AAAG T T GT Ag C AG AT GG t T T T TAT AC T TAT CG C T T ACGT T AC AC AC C AG 
TAG C AG AAGG AG C AAAT AGT C AG GAGT C AG AC T T T a AAGT T C AAGT AAGT 
AcTAAGTCACCAAATCTTCCTTCACGAGCTCAGTTTGATGaAACTAATCG 
AACATTAAGCTTAGCCATGCCTAAGGAAAGTAGTTATGTTCCTACATATC 
G T T T AC AAT T AGT T T TAT C T CAT GT T GT AAAAG AT G AAG AAT AT G G G GAT 
GAG ACT T CT TAG CAT TAT T T CCAT AT AGAT CAAGAAGGT AAAGT G AC ACT 
T C CT AAAAC G GT T AAG AT AG GAGAG AGT G AGG T T G C G GT AG AC C C T AAGG 
CCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTTGCAACGGTAAAA 
T T GT C T G AC C T C T T G AAT AAG G C AGT AGT AT C AG AG AAAGAAAAC G C T AT 
AGT AAT T T C T AAC AG T T T C AAAT AT T T T GAT AAC T T GAAGAAAG AAC C T A 
T G T T TAT T T C T AAAG AAG G AAAAG TAG T AAAC AAGAAT C TAG AAG AAAT A 
AC AT T AGT T AAG C C T C AAAC T AC AG T T AC T AC T C AAT CAT T GT CT AAAG A 
AAT AACT AAAT C AGGAAAT GAG AAAGT C C T C AC T T C T AC AAAC AAT AAT A 
G T AG C AG AGT AG C T AAG AT CAT AT C AC CT AAAC AT AAC G G G G AT T CT G T T 
AACCATACC 

SEQ ID NO. 4408 
STRAIN M781 

GAG GAG C AAG AAT T AAAAAAC C AAG AG C AAT C AC CT GT 
AATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATATTG 
T T G AAAAAAC AT C T GT AAC AG CT G C T T C T GC T AGT AAT AC AGT G AAAG AA 
AT G GGT GAT AC AT C T GT AAAAAAT G AC AAAAC AG AAG AT G AAT TAT TAG A 
AG AG T TAT CT AAAAAC C T T GAT AC G T C T AAT TTGGGGGCT GAT C T T GAAG 
AAG AAT AT C C CT C T AAAC C AG AG AC AAC C AAC AAT AAAGAAAG C AAT G T A 
GT AAC AAAT G C T T C AAC T G C AAT AG C AC AG AAAG T T C C C T C AGC AT AT G A 
AG AG G T GAAG T C AG AAAG C AAGT CAT CGCTTGCTGTTCTT GAT AC AT C T A 
AAAT AAC AAAAT T AC AAG C C AC AAC C C AAAGAGG AAAGGG AAAT GT AG T A 
GCTATTATTGATACTGGCTTTGATATTAACCATGATATTTTTCGTTTAGA 
TAGCCCAAAAGATGATAAGCACAGCTTTAAAACTAAGGCAGAATTTGAGG 
AATTAAAAGCAAAACATAATATCACTTATGGGAAATGGGTTAACGATAAG 
ATTGTTTTTGCACATAACTACGCCAaCAATACAGAAACGGTGGCTGATAT 
T G C AG C AG C T AT G AAAG AT GG T TAT GG G T C AG AAG C AAAG AAT AT T T T G C 
AT G GT AC AC AC GT T G CT GGT AT T T T T G T AG GT AAT AG T AAACGT C C AG C A 
ATCAATAGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATT 
AAT G CGT AT T C C AG AT AAAAT T GAT T C G G AC AAAT T T G GAG AAG CAT AT G 
C T AAAGC AAT CAT AG AC G C T G T T AAT C T AG G AG C AAAAAC GAT T AAT AT G 
AGCCTGGGAAAAACGGCTGATTCTTTAATTGCTCTCAATGATAAAGTTAA 
ATTAGCACTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTTGTGGCTG 
C C G G AAAT G AAGGT G C AT T T G G TAT GG AT TAT AG C AAa C CAT TAT C Aa C T 
AATCCTGACTACGGTACGGTTAATAGTCCAGCTATTTCTGAAGATACTTT 
GAGTGTTGCTAGCTATGAATCACTtAAAACTATCAGTGAGGTCGTTGAAA 
CAACTATTGAAGGTAAGTTAGTTAAGTTGCCGATTGTGACtTCTAaACCT 
TTTGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGGTGCAAA 
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AAAGATTTTGAAGGTAAGGACTTTAAAGGTAAGATTGCATTAATTGAGCG 
TGGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTACAAATGCAG 
GTGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTT 
cTAATTCCTTACCGTGAATTACCTGTGgGGGTTATTAGTAAAGTAGATGG 
CG AG C G T AT AAAAAAT ACT T C AAGT C AG T T AAC AT T T AAC C AG AGT T TT g 
AAGTAGTTGATAGCCAAGGTGGCAATCGTATGCTGGAACAATCAAGTTGG 
G G C GT G AC AG CT G AAG G AGC AAT C AAG C C T GAT GT AAC AG CTTCTGGCTT 
T G AAAT T TAT T C T T C AAC CT A T AAT AAT C AAT ACT AAAC AAT GT CT GGT A 
CAAGTATGGCTTCACCACATGTTGCAGGATTAATGACAATGCTTCAAAGT 
CAT T T GG CT G AG AAAT AT AAAG G GAT G AAT T TAG AT T C T AAAAAAT T GC T 
AG AAT T GT CT AAAAAC AT C CT CAT G AGCT C AG C AAC AG CAT TAT AT AGT G 
AAGAGGATAAGGCGTTTTATTCACCACGTCAGCAAGGTGCAGGTGTAGTT 
GAT G C T GAAAAAGC T AT C C AAG CT C AAT AT TAT GT T AC T GG AAACG AT GG 
C AAAGT T AAAAT T AAT C T C AAAC GAG AG G GAG AT AAAT T T GAT AT C AC AG 
T T AC AAT T CAT a a AC T T GT Ag AAGGT GT C AAAG AAT T GT AT TAT C AAG C T 
AAT GT AGC a a C AG AAC AAGT AAAT AaAGGT AAAT TTGCCCTTaAaCCa C A 
AG C CT T GCT AG AT AC T AAT T G G C AG A a AGT a AT T C T T cGT GAT AAAG AAA 
C AC AAGT T c GAT T TACT At T GAT G CT AG T C AAT T T AGT C AG AAAT T AAAA 
GAACAGATGGCAAATGGTTATTTCTTAGAAGGTTTTGTACGTTTTAAAGA 
AGCCAAGGATAGTAATCAGGAGTTAATGAGTATTCCTTTTGTAGGATTTA 
AT GGT GAT T T T GC G AAC T t AC AAG C ACT T G AAAC ACC GAT T TAT AAG AC G 
C T T T C T AAAGGT AGT T T CT AC T AT AAa C C AAAT GAT AC AAC T CAT AAAG A 
CCAATTGGAGTACAATGAATCAGCTCCTTTTGAAAGCAACAACTATACTG 
CCTTGTTAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAAAAAT 
GG T GG G G AGT T AGAAT TAG C AC C GG AG AGT C C AAAAAG AAT TAT T T T AGG 
AACTTTTGAGAATAAGGTTGAGGATAAAACAATTCATCTTTTGGAAAGAG 
AT G C AG C GAAT AAT C CAT AT T T T G C CAT T T C T C C AAAT AAAG AT GG AAAT 
AG G G AC G a a AT C ACT C C C C AGG C a AC t T T C T T AAG AAAT GT T AAGG AT AT 
T T CT G CT C AAG t T CT AG AT C AAAAT GG AAAT GT TAT T T GG C AAAGT AAG G 
T T T T AC CAT C T TAT C GT AAAAAT T T C CAT AAT a AT C C AAAG C AAAGT GAT 
GGTCATTATCGTATGGATGCTCTTCAGTGGAGTGGTTTAGATAAGGATGG 
CAAAGTTGTAGCAGATGGTTTTTATACTTATCGCTTACGTTACACACCAG 
TAGCAGAAGGAGCAAATAGTCAGGAGTCAGACTTTAAAGTTCAAGTAAGT 
ACTAAGTCACCAAATCTTCCTTCACGAGCTCAGTTTGATGAAACTAATCG 
AAC AT T AAG C T T AGC CAT G C C T AAGG AAAGT AGT TAT G T T C C T AC At AT C 
GTTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGGGGAT 
GAGACTTCTTACCATTATTTCCATATAGATCAAGAAGGTAAAGTGACACT 
TCCTAAAACGGTTAAGATAGGAGAGAGTGAGGTTGCGGTAGACCCTAAGG 
CCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTTGCAACGGTAAAA 
T T G T C T G AC C T CT T GAAT AAG G C AGT AG TAT C AG AG AAAG AAAAC GCT AT 
AGT AAT T T C T AAC AG T T T C AAAT AT T T T GAT AAC T T G AAGAAAGAAC C T A 
T G T T TAT T T C T AAAG AAGG AAAAGT AGT AAAC AAG AAT C T AGAAG AAAT A 
AC AT TAG T T AAG C C T C AAACT AC AGT T AC TACT C AAT CAT T G T CT AAAGA 
AAT AAC T AAAT C AGG AAAT GAG AAAGT C C T C AC T T C T AC AAAC AAT AAT A 
GT AG C AGAGT AG C T AAG AT CAT AT C AC C T AAAC AT AAC GGGG AT T CT G T T 
AAC CAT ACC 

SEQ ID NO. 4409 
STRAIN CJB110 

GAG G AG C AAG AAT T AAAAAAC C AAG AG C AAT C AC C T G T AA 
TTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATATTGTT 
GAAAAAAC AT CTGT AnCAGCTGCT T CT GCT AGT AAT AC AGCG AAAG AAAT 
GG GT GAT AC AT C T GT AAAAAAT G AC AAAAC AG AAG AT GAAT TAT TAG AAG 
AGT TAT C T AAAAAC C T T G AT AC GT C T AAT wT G G GGG CT G AT C T T G AAG AA 
GAAT AT C C CT CT AAAC CAGAGACAACC AAC AAT AAAG AAAG C AAT GT AGT 
AAC AAAT G C T T C AACT G C AAT AG C AC AG AAAGT T C C C T C AG C GT AT G AAG 
AGGT G a AG C C AG AAAG C AAG T CAT CGCTTGCTGTTTTT GAT AC AT C T AAA 
AT AAC AAAAT T G C AAG C CAT AAC C C AAAG AGG AAAGG G AAAT G T AGT AG C 
TATTATTGATACTGGCTTTGATATTAACCATGATATTTTTCGTTTAGATA 
GC C C AAAAG AT GAT AAG C AC AG CT T T AAAAC T AAAG C AG AAT T C G AGG AA 
t T AAAAG C AAAAC AT AAT AT C ACT TAT G G G AAAT GG GT T AAC GAT AAG AT 
TGTTTTTG C AC AT AAC T AC G C C AAC AAT AC AG AAAC GGT G G C T GAT AT T G 
C AGC AG C T AT G AAAG AT G GT T AT GGGT C AG AAG C AAAG AAT AT T T C G CAT 
GGTACACACGTTGCTGGTATTTTTGTAGGTAATAGTAAACGTCCAGCAAT 
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CAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAA 
T G CGT AT T C C AGAT AAAAT T GAT T CGGAC AAAT T TGG AGAAGC AT AT G C T 
AAAG C AAT C AC AGACGC T G TT AAT C TAG GAG C AAAAAC GAT T AAT AT GAG 
C CTT GGAAAAAC AG C AGAT T C TT T AAT T G C AC T C AAT G AT AAAGT T AAAT 
TAgCACTTAAATTAGCTTcTGAGAAGGGCGTTGCAGTTGTTGTGGCTGCC 
GG AAAT GAAG GT G CAT T T G GT AT GGAT T AT Ag C AAAC CAT TAT C AACT AA 

TcCTGACTACGGtACGGTTAATAGTCCAGCTATTTcTGAAGATACTTTGA 
GTGTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCGTTGaAACA 
ACT AT T GAAGGT AAG T T AGT T AAGT T G C CG AT T GT G AC T T c T AAAC C T T T 
T GAC AAAGGT AAGG C CT AC GAT GT GGT T TAT G C C AAT TAT G G T G C AAAAA 
AAGACT T TG AAGGT AAG G AC T T T AAAGGT AAG AT T G C AT T AAT T G AGC GT 
GGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTACAAATGCAGG 
TGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTTc 
T AAT T CCT T AC CGT GAAT T AC CTGTGgGGGT TAT T AGT AAAGT AGAT GGC 
G AGC G T AT AAAAAAT ACT T C AAGT C AGT T AAC AT T T AAC C AgAGT T T T G A 
AGT AgT T GAT AG C C AAg GT G G C AAT C GT AT G C T GG AAC AAT C AAGT t G GG 
G CGT GAC AG CT G AAGGAG C AAT C AAG CCT GAT G T AAC AG CTTCTGGCTTT 
GAAAT T TAT T CT T C AAC C TAT AAT AAT C AAT AC C AAAC AAT GT C T G G T AC 
AAGT AT G G C T T C AC C AC AT G t T G C AG GAT T AAT GAC AAT G CT T C AAAAT C 
AT T T GGCT GAG AAAT AT AAAGG GAT GAAT T TAG ATT C T AAAAAAT T G C T A 
GAATTGTCTAAAAACATCCTCATGAGCTCAGCAACAGCATTATATAGTGA 
AGAGGATAAGGCGTTTTATTCACCACGTCAGCAAGGtGCAGGTGTAGTTG 
AT GCT GAAAAAG C TAT C C AAG C T C AAT AT T AT GT T ACT GGAAACG AT GGC 
AAAGCTAAAATT AAT CT CAAACGAGTGGGAGATAAAT T TGATAT C AC AGT 
T AC AAT T CAT AAAC T T GT AG AAGGT GT C AAAGAAT T GT AT TAT C AAG CT A 
AT GT AGC AAC AGAACAAG T AAAT AAAGGT AAAT TT GC C CTT a AAC C AC AA 
G C CT T G C TAG AT ACT AAT TGG C AG AAAGT AAT T C T T c G T GAT AAAG AAAC 
AC AAGT T C GAT T T AC T A t T GAT G CT AGT C AAT T T Ag T C AG AAAT T AAAAG 
AACAGATGGCAAATGGTTATTTCTTAgAAGGTTTTGTACGTTTTAAAGAA 
G C C AAG GAT AGT AAT C AGG AGT T AAT GAGT AT TCCTTTTG TAG GAT T T AA 
T G GT GAT T T T GC G AAC T t AC AAG C AC T T G AAAC AC CG AT T TAT AAG AC G C 
T T T CT AAAGGT AGT t T C T AC TAT AAAC C AAAT GAT AC AAC T CAT AAAG AC 
C AAT T GG AGT AC AAT GAAT C AG CT C c t T T T G AAAG C AAC AAC TAT ACT G C 
CTT GT T AAC AC AAT C AG CGT CT T GGG GC T AT GT T GAT TAT G T C AAAAAT G 
GTGGGGAGTTAGAATTAGCACCGGAGAGTCCAAAAAGAATTATTTTAGGA 
ACT T T T GAG AAT AAG G T T G AGG AT AAAAC AAT T CAT C T T T T G G AAAG AG A 
T G C AG CGAAT AAT C CAT AT T T T G C C AT T T CT C C AAAT AAAG AT GG AAAT A 
GGGATGaaATCACTCCCCAGGCAACtTTCTTAAGAAATGTTAAGGATATT 
TCTGCTCAAGTTCTAGATCAAAATGGAAATGTTATTTGGCAAAGTAAGGT 
T T T AC CAT C T TAT C G T AAAAAT T T C CAT AAT AAT C C AAAG C AAAGT GAT G 
GT CAT TAT C GT AT G GAT G C C T T T C AGT G GAGT GGT T T Ag AT AAgG AT GGC 
AAAGT T GT AG C AG AT GGT T T T TAT AC T TAT CG C C T AC GT T AC AC AC C AGT 
AG C AG AAgG AG C AAAT AGT C AGG AGT C Ag ACT T T AAAGT T C AAGT AAGT A 
CT AAGT C AC C AAAT C T T C CT T T AC T AG CT C AGT T T GAT G AAAC T AAT C GA 
AC AT T AAG CT TAG C C ATG C C T AAG G AAAGT AGT T AT GT T C C T AC AT AT C G 
TTTACAATTAGTTTT AT CTCATGTTGT AAAAG ATGAAGAATATGGGGATG 
AGAC T T C T T AC CAT TAT T T C CAT AT AGAT C AAGAAGG T AAAG T GAC AC T T 
C CT AAAAC GGT T AAG AT AGG AG AGAG T GAG GT T G C AGT AG AC C CT AAGG C 
CTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTTGCAACGGTaAAAT 
T GT C T G AC CT CT T G Aa T AAg GC AGT AGT AT C AG AG AAAG AAAAC G C TATA 
GT AAT T T CT AAC AGT T T C AAAT AT T T T GAT AACT T GAAAAAAG AAT C TAT 
GT T TAT T T C T AAAG AAGG AAAAG T AGT AAAC AAG AAT CT AG AAGAAAT AA 
CAT T AGT T AAG C C GC Aa AC T AC AGT T AC T AC T C AAT CAT T GT C T AAAG AA 
ATAACTAAATCAGGAAATGAGAAAGTCCTCACTTCTACAAACAATAATAG 
TAG C AGAGT AG C T AAG AT CAT AT C AC C T AAAC AT AAC G G GG AT T C T GT T A 
ACCATACC 

SEQ ID NO. 4410 
STRAIN 1169NT 

G AGGAG C AAG AAT T AAAAAAC C AAG AG C AAT C 

ACCTGTAATTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTA 
ATATTGTTGAAAAAACATCTGTAACAGCTGCTTCTGCTAGTAATACAGCG 
AAAG AAAT G GGT GAT AC AT C T GT AAAAAAT GAC AAAAC AG AAG AT GAAT T 
ATTAGAAGAGTTATCTAAAAACCTTGATACGTCTAATATGGGGGCTGATC 
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TTGAAGAAGAATAT CCCTCT AAAC C AG AGAC AAC C AAC AAT AAGGAAAG C 
AATGTAGTAACAAATGCTTCAACTGCAATAGCACAGAAAGTTCCCTCAGC 
ATATGAAGAGGTGAAGCCAAAAAGCAAGT CAT CGCTTG CT GTTCTTGAT A 
C AT CT AAAAT AAC AAAATT G C AAG C C AT AACC C AAAG AGGAAAGG GAAAT 
GT AGT AGCT AT TAT T GAT AC T GG C T T T GAT AT T AAC CAT GAT AT T T T T CG 
T T T AGAT AG C C C AAAAG AT GAT AAGC AC AG CT T T AAAAAT AAGG C AG AAT 
TCGAGGAATTAAAAGCAAAACATAATATCACTTATGGGAAATGGGTTAAC 
GATAAGATTGTTTTTGCACATAACTACGCCAACAATACAGAAACGGTGGC 
T GAT AT T G C AG C AG CT AT GAAAg AT GGT T AT G GT T C AG AAGC AAAGAAT A 
T T T C G CAT GGT AC AC AC GT T GCT G GT AT T t T T GTAGGT AAT AGT AAAC GT 
CCAGCAATCAATGGTCTTCTTTTAgAAGGTGCAgCGCCAAATGCTCAAGT 
C T TAT T AAT GC GT AT T C C AG AT AAAAT t GAT T CG G AC AAAT T t G G AGAAG 
CAT AT GC T AAAG C AAT C AC AGAC G CT GT T AAT C TAG GAG CT a AAAC GAT T 
AAT AT G AGT AT T G GAAAAAC AG CT GAT T C T T T AAT T GC T C T C AAT G AT AA 
AGTT AAATTAgC ACT T AAAT T AGCT TCTGAGAAGGGCGTTGCAGTTGTTG 
TGGCTGcCGGAAATGAAGGCGCATTtGGTATGGATTATAGCAAACCGTTA 
T C AACT AAT c CT G ACT AC GG t ACGG t T AAT AGT C C AG CT AT T T C T G AAG A 
TACTTTGAGTGTTGCTAGCTATGAATCACTTAAAACTATCAGTGAGGTCG 
TTGAAACAACTATTGAAGGTAAGTTAGTTAAGTtGCCGATTGtGACTTCT 
AAACCTTttGACAAAGGTAAGGCCTACGATGTGGTTTATGCCAATTATGG 
T GC AAAAAAAG AC T T T GAAGGT AAGG ACT T T AAAG GT AAG AT T G C AT T AA 
TTGAGCGTGGTGGTGGACTTGATTTTATGACTAAAATCACTCATGCTACA 
AAT G C AGGT GT TGT T G GT AT C GT T AT T T T T AAC GAT C AAG AAAAAC GT G G 
AAATTTTCTAATTCCTTACCGTGAATTACCTGTGGGGGTTATTAGTAAAG 
T AGAT G G CG AG CGT AT AAAAa AT ACT T C AAGT C AGT T AAC AT T T AAC C Ag 
AGAT T T G AAGT AGT T GAT AG C C AAg GT G G C AAT C GT AT G CT GG AAC AAT C 
a AGT tGGGG C GT GAC AGC T G AAGG AGC AAT C AAG C CT GAT G T AAC AGCTT 
C T GG C T T CG a AAT T TAT T CT T C a a C CT AT AAT AAT C AAT AC C AAAC AATG 
T CT G GT AC AAGT AT G G CT T C AC C AC AT GT T G C AG GAT T AAT GAC AAT GCT 
T CAAAGT CAT TTGGCT GAGa AAT AT AAAGGGAT GAATTTAgATT CT Aa AA 
AAT T GCT AGAAT TGT CT AAAAAC AT C C T CAT GAG CT C AG C AAC AG C AT T A 
TAT AGT G AAG AGG AT AAGG C GT T T TAT T C AC C AC GT C AGC AAGG t G C AGG 
TGTAGTTGATGCTGAAAAAGCTATCCAAGCTCAATATTATGTTACTGGAA 
ACGAT G G C AAAG C T AAAAT T AAT C T CAAACG AGT G GG AG AT AAAT T T GAT 
AT CAC AGTTACAATT C AT AAACTT GT AGAAGGTGT CAAAGAATT GTATT A 
T C AAG C T AAT GT AGC AAC AG AAC AAGT AAAT AAAGGT AAAT TTGCCCTTA 
AACCACAAGCCTTGCTAGATACTAATTGGCAGAAAGTAATTCTTcGTGAT 
AAAG AAAC AC AAGT T C GAT T TACT AT T GAT G CT AGT C AAT T T AgT C AG AA 
ATTAAAAGAACAGATGGCAAATGGTTATTTCTTAgAAGGTTTTGTACGTT 
TTAAAGAAGCTAAGGATAGTAATCAGGAGTTAATGAGTATTCCTTTTGTA 
G GAT T T AAT GGT GAT T T T G CG AGC T T AC AAG CAC T T G AAAC AC C GAT T T A 
T AAG ACGCTTTCT AAAGGT AGT TTCTACTATAAACCAAATGATACAACTC 
AT AAAGACCAATTGGAGT AT AAT GAAT CAGCT C CTT TTGAAAGC AAC AAC 
TAT AC T G C C T T GT T AAC AC AAT C AG CGTCTTGGGGC TAT GT T G AT T AT GT 
C a AAAAT GGT GGGG AGT TAG AAT TAG CAC C G G AGAG T c C AAAAAG AAT T A 
TTTTAGGAACTTTTGAGAATAAGGTTGAGGATAAAACAATTCATCTTTTG 
G AAAGAG AT G C AGC GAAT AAT C CAT AT T TT G C CAT T T C T C C AAAT AAAG A 
T GG AAAT AGG GAT GAAAT C ACT C C C C AGG C AAC T T T C T T AAGAAAT GT T A 
AGGAT AT TT CTGCT C AAGTT CT AGAT CAAAAT GGAAAT GTT AT T T GGCAA 
AGTAAGGTTTTACCATCTTATCGTAAAAATTTCCATAATAATCCAAAGCA 
GAGTGATGGTCATTATCGTATGGATGCCCTTCAGTGGAGTGGTTTAgATA 
AGGATGGCAAAGTTGTAGCAGATGGTTTTTATACTTATCGCTTACGTTAC 
AC AC C AGT AG C AG AAG G AGC AAAT AGT C AGG AG T C AG ACT T T AAAG T T C A 
AG T AAGT AC T AAGT CAC C AAAT CTT C CT T C AC G AG C T C AGT T T GAT G a AA 
C T AAT C G AAC AT T AAG CT T AGC CAT G C C T AAGG G AAG T AGTT AT GT T C C T 
AT ATAT CGT CT AC AATT AGT TTT AT CT CATGTT GTAAAAGAT G AAGAAT A 
TGGAG AT GAGACTT CTTACT AT T ATT T C CAT AT AGAT C AAGAAGGTAAAG 
C GAC AC T T C C T AAAAC GGT T AAG AT AG GAG AG AG T G AGGT T GC AGT AG AC 
CCTAAGGCCTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCAaC 
G GT AAAAT T GT C T GAC C T C T T GAAT AAG G C AGT AGT AT C AG AG AAAG AAA 
ACGCTATAGTAATTTCTAACAGTTTCAAATATTTTGATAACTTGAAAAAA 
GAACCTATGT TTATTT CTAAAAAAGAAAAAGT AGT AAAC AAGAAT CT AG A 
AGAa AT AAT AT T AGT T AAG C CG C Ac AC T AC AGT T AC TAG T C Aa T CAT TGT 
CT AAAG AAAT AACT AAAT C AGG AAAT GAG AAAGT CCT C ACT T CT AC AAAC 
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AATAAT AGT AGT AGAGT AGC T AAAAT CAT AT CACCTAAACATAATGGGGA 
TTCTGTTAACCATACC 

SEQ ID NO. 4411 
STRAIN JM9130013 

GAGG AG C AAG AAT T AAAAAAC CAAG AG C AAT CAC CT GT AA 
TTGCTAATGTTGCTCAACAGCCATCGCCATCGGTAACTACTAATACTGTT 
GAAAAAAC AT CT GT AAC AG CT G CT T CT G C T AGT AAT AC AGCG AAAGAAAT 
GGGTGAT ACAT CT GT AAAAAATGACAAAAC AGAAGATGAATTATT AGAAG 
AGTTATCTAAAAACCTTGATACGTCTAATTTGGGGGCTGATCTTGAAGAA 
G AAT AT C C C T C T AAAC C AGAG AC AAC C AAC AAT AAAG AAAGC AAT G T AG T 
AAC AAAT G C T T C AACT G C AAT AG C AC AG AAAGT TCC C T C AG CAT AT G AAG 
AGGTGAAGCCAGAAAGCAAGTCATCGCTTGCTGTTCTTGATACATCTAAA 
ATAACAAAAT TACAAGCCATAACC C AAAGAGGAAAGGGAAAT GTAGTAGC 
TAT TAT T GAT ACT G G C T T T GAT AT T AAC CAT GAT AT T T T T C G T T TAG AT A 
GC C C AAAAGAT GAT AAG CAC AG CT T T AAAACT AAG AC AG AAT T T GAGG AA 
T T AAAAG C AAAAC AT AAT AT C AC T TAT GGG AAAT GGGT T AAC GAT AAGAT 
TGTTTTTGCACATAACTACGCCAACAATACAGAAACGGTGGCTGATATTG 
CAGCAGCTATGAAAGATGGTTATGGTTCAGAAGCAAAGAATATTTCGCAT 
G GT AC AC AC GT T GC T G GT AT T T T T GTAGGT AAT AGT AAAC GT C C AGC AAT 
CAATGGTCTTCTTTTAGAAGGTGCAGCGCCAAATGCTCAAGTCTTATTAA 
T G C GT AT T C C AG AT AAAAT T GAT T C GGAC AAAT T T GGT G AAG CAT AT G C T 
AAAGC AAT CAC AG AC G CT GT T AAT CT AG GAGC AAAAACG AT T AAT AT GAG 
TAT T G G AAAAAC AG C T GAT T CT T T AAT T GCT CT C AAT GAT AAAGT T AAAT 
TAGCACTTAAATTAGCTTCTGAGAAGGGCGTTGCAGTTGTTGTGGCTGCC 
G GAAAT GAAGG C G CAT T T G GT ATGG AT T AT AGCAAAC CAT TAT C AAC T AA 
T CC T GAC T AC G GT AC GGT T AAT AG T CC AG C TAT T T C T G AAG AT AC T T T G A 
GT GT T GCT AGC TAT GAAT CAC T T AAAAC TAT C AGT G AGGT C GT T G AAAC A 
ACT AT T G AAGGT AAGT T AG T T AAGT T G CC G AT T G T GAC T T C T AAAC CT T T 
TGACAAAgGTAAgGCCTACGATGTGGTTTATGCCAATTATGGTGCAAAAA 
AAG ACT T T GAAGGT AAG GAC T T T AAAG GT AAG AT T G CAT T AAT T GAG C GT 
GGT GGT GGAC T T GAT T T TAT G AC T AAAAT C ACT CAT G C T AC AAAT GC AGG 
TGTTGTTGGTATCGTTATTTTTAACGATCAAGAAAAACGTGGAAATTTTC 
T AAT TCCTTACCGTG AAT TACCTGTGGGG ATT ATT AGT AAAGT AGATGGC 
G AGCGT AT AAAAAAT ACT T CAAG T C AGT T AAC AT T T AAC C AG AG T T T T GA 
AGT AGT T GAT AG C C AAGG T G GT AAT C GT AT G C T G G AAC AAT CAAG T T GGG 
GCGTGACAGCTGAAGGAGCAATCAAGCCTGATGTAACAGCTTCTGGCTTT 
G AAATTT AT T CT T C AACCT AT AAT AAT CAAT AC C AAACAAT GT CT GGT AC 
AAGT AT G GCT T CAC C AC AT GT T GC AGG AT T AAT GAC AAT G C T T C AAAGT C 
AT T T G GC T G AGAAAT AT AAAGGG a T GAAT T T AGAT T C T AAAAAAT T GC T A 
GAAT T GT C T AAAAAC AT C C T CAT GAG C T C AG C AAC AG CAT TAT AT AGT G A 
AG AGG AT AAG GC GT T T TAT T CAC CAC GT C AG C AAGGT GC AGG T GT AGT T G 
AT GCT G AAAAAG C TAT C CAAG C T C a AT AT TAT AT TACT GG AAAC GAT G G C 
AAAG C T AAAAT T AAT CT C AAAC GAAT GGG AG AT AAAT T T GAT AT CAC AGT 
T AC AAT T CAT a AAC T TG TAG AAGGT GT C AAAGAAt T GT AT T AT CAAG C T A 
AT GT AG C AAC AG AAC AAGT AAAT AAAGGT AAAT TTGCCCTT a AAC C AC AA 
G C CT T GC TAG AT AC T AAT T G G C AG AAAGT AAT T CT T CGT GAT AAAG AAAC 
AC AAGT T C GAT T T AC T AT T GAT G C T AGT CAAT T T AG T C AG AAAT T AAAAG 
AAC AG AT GG C AAAT G GT T AT T T C T TAG AAGGT T T T GT AC GT T T T AAAG AA 
G C C AAGG AT AGT AAT C AGG AGT T AAT G AGT AT T C C T T T T GT AGG AT T T AA 
T GG T GAT T T T G C G AAC T T AC AAG CAC T T G AAAC AC C GAT T TAT AAG AC G C 
T T T CT AAAG GT AGT T T C T AC TAT AAAC C AAAT GAT AC AAC T CAT AAAG AC 
CAATTGGAGTACAATGAATCAGCTCCTTTTGAAAGCAACAACTATACTGC 
CTTGTTAACACAATCAGCGTCTTGGGGCTATGTTGATTATGTCAAAAATG 
G T GGGGAGT T AG AAT TAG CAC C GG AG AG T C C AAAAAG AAT TAT T T T AG G A 
ACTTTTGAGAATAAGGTTGAGGATAAAACAATTCATCTTTTGGAAAGAGA 
T G C AG C GAAT AAT C CAT AT T T T GC C AT T T C T C C AAAT AAAG AT G GAAAT A 
G GGAC GAAAT C ACT C C C C AG G C AAC T T T CT T AAG AAAT GT T AAG GAT AT T 
T C T G CT C AAGT T C TAG AT C AAAAT G GAAAT G T T AT T T G GC AAAGT AAGGT 
T T T AC CAT C T TAT C G T AAAAAT T T C CAT AAT AAT C C AAAG C AAAGT GAT G 
G T CAT TAT C GT AT G GAT G C T C T T C AGT G G AGT GGT T TAG AT AAG GAT G G C 
AAAGTTGTAGCAGATGGTTTTTATACTTATCGCTTACGTTACACACCAGT 
AGC AGAAGGAGCAAATAGT C AGG AGT C AGACT T T AAAGT AC AAGT AAGT A 
C T AAG T C AC C AAAT C T T C C T T CAC GAG C T C AGT T T GAT G AAAC T AAT C G A 
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AC AT T AAG CT T AG C CAT GC C T AAG G AAAGT AG T TAT GT T C C T AC AT AT C G 
TTTACAATTAGTTTTATCTCATGTTGTAAAAGATGAAGAATATGGGGATG 
AGACTTCTTACCATTATTTCCATATAGATCAAGAAGGTAAAGTGACACTT 
CCTAAAACGGTTAAGATAGGAGAGAGTGAGGTTGCGGTAGACCCTAAGGC 
CTTGACACTTGTTGTGGAAGATAAAGCTGGTAATTTCGCAaCGGTAAAAT 
TGTCTGATCTCTTGAATAAGGCAGTAGTATCAGAGAAAGAAAACGCTATA 
GTAATTT CT a ACAGTT T CAAAT ATTTTGAT AACT TGAAAAAAGAACCTAT 
GT T T AT T T C T AAAAAAGAAAAAGT AGT AAAC AAG AAT C T AG AAG AAAT AA 
T AT T AGT T AAGC C G C AAAC T AC AGT T AC T AC T C AAT CAT T GT CT AAAG AA 
ATAACTAAATCAGGAAATGAGAAAGTCCTCACTTCTACAAACAATAATAG 
TAG C AG AGT AG C T AAGAT CAT AT C AC CT AAAC AT AAC G GGGAT T C T GT T A 
ACCATACC 

SEQ ID NO. 4412 
STRAIN 2603 

VDKHHSKKAILKLTLITTSILLMHSNQVNAEEQELKNQEQSPVIANVAQQPSPSVTTNTV 
EKTSVTAASASNTAKEMGDTSVBCNDKTEDELLEELSKNLDTSNLGADLEEEYPSKPETTN 
NKESNWTNASTAIAQKVPSAYEEVKPESKSSLAVLDTSKITKLQAITQRGKGNWAIID 
TGFDINHDIFRLDSPKDDKHSFKTKTEFEELKAKHNITYGKWVNDKIVFAHNYANNTETV 
ADIAAAMKDGYGSEAKNISHGTHVAGIFVGNSKRPAINGLLLEGAAPNAQVLLMRIPDKI 
DSDKFGEAYAKAITDAWLGAKTINMSIGKTADSLIALNDKVKLALKLASEKGVAVVVAA 
GNEGAFGMDYSKPLSTNPDYGTVNS PAI SE DTLS VAS YESLKT I SEWETTIEGKLVKLP 
I VT S K P FDKGKA Y D W Y AN YG AKK D FE GK D FKG K I AL IERGGGLD FMT KIT HATN AG WG 
IVIFNDQEKRGNFLIPYRELPVGIISKVDGERIKNTSSQLTFNQSFEWDSQGGNRMLEQ 
SSWGVTAEGAIKPDVTASGFEIYSSTYNNQYQTMSGTSMASPHVAGLMTMLQSHLAEKYK 
GMNLDSKKLLELSKNILMSSATALYSEEDKAFYSPRQQGAGWDAEKAIQAQYYITGNDG 
KAKINLKRMGDKFDITVTIHKLVEGVKELYYQANVATEQVNKGKFALKPQALLDTNWQKV 
ILRDKETQVRFTIDASQFSQKLKEQMANGYFLEGFVRFKEAKDSNQELMSIPFVGFNGDF 
ANLQALETPIYKTLSKGSFYYKPNDTTHKDQLEYNESAPFESNNYTALLTQSASWGYVDY 
VKNGGELELAPESPKRIILGTFENKVEDKTIHLLERDAANNPYFAISPNKDGNRDEITPQ 
ATFLRNVKDISAQVLDQNGNVIWQSKVLPSYRKNFHNNPKQSDGHYRMDALQWSGLDKDG 
KWADGFYTYRLRYTPVAEGANSQESDFKVQVSTKSPNLPSRAQFDETNRTLSLAMPKES 
SYVPTYRLQLVLSHWKDEEYGDETSYHYFHIDQEGKVTLPKTVKIGESEVAVDPKALTL 
WE DKAGNFATVKLS DLLNKAWSEKENAI VI SNS FKYFDNLKKE PMFI SKKEKWNKNL 
EEIILVKPQTTVTTQSLSKEITKSGNEKVLTSTNNNSSRVAKIISPKHNGDSVNHTLPST 
S DRATNGL FVGT LALLS S LLL YLKPKKTKNN S K 

SEQ ID NO. 4413 
STRAIN A909 

EEQELKNQEQSPVIANVAQQPSPSVTTNTVEKTSVTSASASNTAKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPESK 
S S LAVLDTSKITKLQAITQRGKGNWAI IDTGFDINHDI FRLDS PKDDKHS FKTKAE FEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNI SHGTHVAGI FVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSLGK 
TADS LIALNDKVKLALKLASEKGVAVWAAGNEGAFGMDYSKPLSTNPDYGTVNS PAI SE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKRL.R.G 
L.R. DCIN. AWWWT. FYD . NHSCYKCRCCWYRYF. RSRKTWKFSNSLP . ITCGGY. . SRW 
RAYKKYFKSVNI . PEF. SS. . PRWQSYAGTIKLGRDS . RSNQA. CNSFWL . NLFFNL . .S 
I PNNVWYKYGFTTCCRINDNASKSFG . EI . RDEFRF . KIARIV . KHPHELSNSII . . RG . 
GVLFTTSARCRCS . C . KS YPSS ILCYWKRWQS . N . SQTSGR . I . YHSYNS . TCRRCQRIV 
LSS . CSNRTSK . R . ICP . TTSLARY . LAESNSS . . RNTSSIYY . F . SI . SE IKRT DGKWL 
FLRRFCTF . RSQG . . SGVNEYS FCRI . W . FCELTST . NTDL . DAF . R . FLL . TK . YNS . R 
PIGVQ . ISSF . KQQL YCLVNT I SVLGLC . LCQKWWGVRISTGESKKNYFRNF. E .G.G.N 
NSSFGKRCSE . SIFCHFSK. RWK. G .NHSPGNFLKKC . GYFCSSSRSKWKCYLAK . GFTI 
LS . KFP. . SKAK. WSLSYGCPSVEWFR. GWQSCSRWFLYLSFTLHTSSRRSK. SGVRL . S 
SSKY . VTKSSFTSSV . . N . SNIKLSHA . GK . LCSYISSTISFISCCKR . RIWR . DFLPLF 
PYRSRR . SDTS . NS . DRRE . GCSRP . DLDTCCGR . SW . FRNGKIV . PLE . GSS IRERKRY 
SNF.QFQIF. . LEKRTYVYF. RRKSSKQESRRNSIS . AANYSYYSIIV . RNNSIRK. ESP 
HFYKQ . . . QSS . DHIT . T . RGFC . PY 

SEQ ID NO. 4414 
STRAIN H36B 

EEQELKNQEQSPVIANVAQQPSPSVTTNTVEKTSVTSASASNTAKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPESK 
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SSLAVLDTSKITKLQAITQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSLGK 
TADSLIALNDKVKLALKLASEKGVAWVAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGWGIVIFNDQEKRGNFLIPYRELPVGVISKVDG 
ERIKNTSSQLTFNQSFEWDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGTSMASPHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYSPRQQGAGWDAEKAIQAQYYVTGNDGKAKINLKRVGDKFDITVTIHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDSSQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFANLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLE YNE S APFE SNN YTALLTQS AS WGYVDYVKNGGELE LAPE S PKRI I LGT FENKVE DKT 
IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 
YRKNFHNNPKQSDGHYRMDALQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFKV 
QVSTKSPNLPSRAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHWKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKTLTLWEDKAGNFATVKLSDLLNKAVVSEKENAI 
VISNNFKYFDNLKKEPMFISKEGBCVVNKNLEEIALVKPQTTVTTQSLSKEITQSGNEKVL 
T S TNNN S S RVAK IIS PKHNGD S VNHT 

SEQ ID NO. 4415 
STRAIN 18RS21 

EEQELKNQEQSPVIANVAQQPSPSVTTNTVEKTSVTAASASNTAKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNVVTNASTAIAQKVPSAYEEVKPESK 
SSLAVLDTSKI TKLQAI TQRGKGNWAI I DTGFDINHDI FRLDS PKDDKHS FKTKTEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSIGK 
TADSLIALNDKVKLALKLASEKGVAWVAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEVVETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGWGIVIFNDQEKRGNFLIPYRELPVGIISKVDG 
ERIKNT S S QLT FNQS FE WDSQGGNRMLEQS S WGVTAEGAIKPDVTASGFE I YS STYNNQ 
YQTMSGTSMASPHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYSPRQQGAGWDAEKAIQAQYYITGNDGKAKINLKRMGDKFDITVTIHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFANLQALETPIYKTISKGSFYYKPNDTTHKD 
QLEYNESAPFESNNYTALLTQSASWGYVDYVKNGGELELAPESPKRIILGTFENKVEDKT 
IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 
YRKNFHNNPKQSDGHYRMDALQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFKV 
QVSTKSPNLPSRAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHWKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKALTLWEDKAGNFATVKLSDLLNKAWSEKENAI 
VISNSFKYFDNLKKEPMFISKKEKVVNKNLEEIILVKPQTTVTTQSLSKEITKSGNEKVL 
TSTNNNSSRVAKI IS PKHNGD S VNHT 

SEQ ID NO. 4416 
STRAIN M732 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTVKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNVVTNASTAIAQKVPSAYEEVKSESK 
SSLAVLDTSKITKLQATTQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNILHGTHVAGIFVG 
NSKRPAINSLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAIIDAVNLGAKT INMSLGK 
TAD S L I ALN DKVKLALKL AS EKGVAV WAAGNE GAFGMD Y S K PL S TN P D YGT VN S P AI S E 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKILBCVRT 
LKVRLH. LSVWDLIL. LKSLMLQMQVLLVSLFLTIKKNVEIF . FLTVNYLWGLLVK . MA 
SV . KILQVS . HLTRVLK. LIAKVAIVCWNNQVGA. QLKEQSSLM. QLLALKFILQPIIIN 
TKQCLVQVWLHHMLQD . . QCFKVIWLRNIKG . I . ILKNC . NCLKTSS . AQQQHYIVKRIR 
RFIHHVSKVQV. LMLKKLSKLNIMLLETMAKLKLISNEREINLISQLQFINL . KVSKNCI 
IKLM . QQNK . IKVNLPLNHKPC . ILIGRK . FFVIKKHKFDLLLMLVNLVRN . KNRWQMVI 
S . KVLYVLKKPRIVIRS . . VFLL . DLMVILRTYKHLKHRFIRRFLKWSTINQMIQLIKT 
NWSTMNQLLLKATTILPC . HNQRLGAMLIMSKMVGS . N . HRRVQKELF . ELLRIRLRIKQ 
FIFWKEMQRIIHILPFLQIKMEIGTKSLPRQLS . EMLRIFLLKF . IKMEMLFGKVRFYHL 
IVKISIIIQSKVMVIIVWMLFSGVV. IRjyiAKL. QMVFILIAYVTHQ. QKEQIVRSQTLKF 
K . VLSHQI FLHELS LMKLIEH . A . PCLRKVVMFLHIVYN . FYLML . KMKNMGMRLLTIIS 
I . IKKVK . HFLKRLR . ERVRLR . TLRP . HLLWKIKLVILQR . NCLTS . IRQ . YQRKKTL . 
. FLTVSNILIT . RKNLCLFLKKEK . . TRI . KK . H . LSLKLQLLLNHCLKK . LNQEMRKSS 
LLQTIIVAE. LRSYHLNITGILLTI 
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SEQ ID NO. 4417 
STRAIN COH1 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTVKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNVVTNASTAIAQKVPSAYEEVKSESK 
SSLAVLDTSKITKLQATTQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEEEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNILHGTHVAGIFVG 
NSKRPAINSLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAIIDAVNLGAKTINMSLGK 
TAD S L I ALN DKVKLALKL AS EKG VAVWAAGNE GAFGMD Y S KP L S TN P D YGT VN S P AI S E 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKILKVRT 
LKVRLH . LSWVDLIL . LKS LMLQMQVLL VS L FLT I KKNVE I F . FLTVNYLWGLLVK . MA 
S V . KI LQVS . HLTRVLK . L I AKVAI VCWNNQVGA . QLKEQSS LM . QLLALKFI LQPIIIN 
TKQCLVQVWLHHMLQD . . QC FKVI WLRN I KG . I . ILKNC . NCLKTSS . AQQQHYIVKRIR 
RFIHHVSKVQV. LMLKKLSKLNIMLLETMAKLKLISNEREINLISQLQFINL . KVSKNCI 
IKLM . QQNK . IKVNLP LNHKPC . ILIGRK . FFVIKKHKFDLLLMLVNLVRN . KNRWQMVI 
S . KVLYVLKKPRIVIRS . . VFLL . DLMVILRTYKHLKHRFIRRFLKWSTINQMIQLIKT 
NWSTMNQLLLKATT ILPC . HNQRLGAMLIMSKMVGS . N . HRRVQKELF . ELLRIRLRIKQ 
FI FWKEMQR 1 1 H I L P FLQ I KME I GTKS L PRQL S . EMLRIFLLKF . IKMEMLFGKVRFYHL 
IVKISIIIQSKVMVIIVWMLFSGVV. IRMAKL. QMVFILIAYVTHQ . QKEQIVRSQTLKF 
K. VLSHQIFLHELSLMKLIEH. A. PCLRKVVMFLHIVYN . FYLML . KMKNMGMRLLTIIS 
I . IKKVK . HFLKRLR . ERVRLR . TLRP . HLLWKIKLVILQR . NCLTS . IRQ . YQRKKTL 
. FLTVSNILIT . RKNLCLFLKREK. . TRI . KK. H . LSLKLQLLLNHCLKK . LNQEMRKSS 
LLQTIIVAE . LRSYHLNITGILLTI 

SEQ ID NO. 4418 
STRAIN M781 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTVKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKSESK 
SSLAVLDTSKITKLQATTQRGKGNWAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNILHGTHVAGIFVG 
NSKRPAINSLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAIIDAVNLGAKTINMSLGK 
TADSLIALNDKVKLALKLASEKGVAWVAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKILKVRT 
LKVRLH. LSWVDLIL . LKSLMLQMQVLLVS LFLT IKKN VE I F . FLTVNYLWGLLVK . MA 
SV. KILQVS . HLTRVLK. LI AKVAI VCWNNQVGA. QLKEQSSLM . QLLALKFILQPIIIN 
TKQCLVQVWLHHMLQD . . QC FKVI WLRN I KG . I . ILKNC . NCLKTSS . AQQQHYIVKRIR 
RFIHHVSKVQV. LMLKKLSKLNIMLLETMAKLKLISNEREINLISQLQFINL. KVSKNCI 
IKLM . QQNK . IKVNLPLNHKPC . ILIGRK . FFVIKKHKFDLLLMLVNLVRN . KNRWQMVI 
S . KVLYVLKKPRIVIRS . . VFLL . DLMVILRTYKHLKHRFIRRFLKWSTINQMIQLIKT 
NWSTMNQLLLKATT ILPC . HNQRLGAMLIMSKMVGS . N . HRRVQKELF . ELLRIRLRIKQ 
FI FWKEMQRI IHILP FLQIKME IGTKSLPRQLS . EMLRIFLLKF . IKMEMLFGKVRFYHL 
IVKISIIIQSKVMVIIVWMLFSGVV. IRMAKL. QMVFILIAYVTHQ. QKEQIVRSQTLKF 
K. VLSHQIFLHELSLMKLIEH. A. PCLRKVVMFLHIVYN . FYLML . KMKNMGMRLLTIIS 
I . IKKVK . HFLKRLR . ERVRLR . TLRP . HLLWKIKLVILQR . NCLTS . IRQ . YQRKKTL 
. FLTVSNILIT . RKNLCLFLKKEK . . TRI . KK . H . LSLKLQLLLNHCLKK . LNQEMRKSS 
LLQTIIVAE. LRSYHLNITGILLTI 

SEQ ID NO. 4419 
STRAIN JM9130013 

EEQELKNQEQSPVIANVAQQPSPSVTTNTVEKTSVTAASASNTAKEMGDTSVPCNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNVVTNASTAIAQKVPSAYEEVKPESK 
SSLAVLDTSKITKLQAITQRGKGNVVAIIDTGFDINHDIFRLDSPKDDKHSFKTKTEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSIGK 
TADS LI ALN DKVKLALKL ASEKGVAVVVAAGNEGAFGMDYSKPLSTNPD YGT VNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGVVGIVIFNDQEKRGNFLIPYRELPVGIISKVDG 
ERIKNTSSQLTFNQSFEVVDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGTSMASPHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYSPRQQGAGVVDAEKAIQAQYYITGNDGKAKINLKRMGDKFDITVTIHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFANLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLEYNESAPFESNNYTALLTQSASWGYVDYVKNGGELELAPESPKRIILGTFENKVEDKT 
I HLLERDAANN P YFAI S PNKDGNRDE I T PQAT FLRN VKD I S AQVLDQNGN VI WQ SKVLPS 
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YRKNFHNNPKQSDGHYRMDALQWSGLDKDGKVVADGFYTYRLRYT PVAEGANSQESDFKV 
QVSTKS PNLPSRAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHVVKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKALTLWEDKAGNFATVKLSDLLNKAWSEKENAI 
VI SNSFKYFDNLKKEPMFI SKKEKVVNKNLEEI ILVKPQTTVTTQSLSKE ITKSGNEKVL 
T S TNNNS SRVAKI I S PKHNGD S VNHT 

SEQ ID NO. 4420 
STRAIN 090 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTVKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNVVTNASTAIAQKVPSAYEEVKPESK 
SSLAVFDTSKITKLQAITQRGKGNVVAIIDTGFDINHDIFRLDSPKDDKHSFKTKAEFEE 
LPCAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSLGK 
TADSLIALNDKVKLALKLASEKGVAVWAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDVVYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGWGIVIFNDQEKRGNFLIPYRELPVGVISKVDG 
ERIKNTSSQLTFNQSFEVVDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGTSMASPHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYS PRQQGAGVVDAEKAIQAQYYVTGNDGKAKINLKRVGDKFDITVT IHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFANLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLE YNE S APFE SNN YT ALLTQS AS WGYVDYVKNGGE LELAPE S PKRI I LGT FENKVE DKT 
IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 
YRKNFHNNPKQSDGHYRMDAFQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFKV 
QVSTKS PNLPLLAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHWKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKALTLWE DKAGNFATVKLSDLLNKAWSEKENAI 

VISNSFKYFDNLKKESMFISKEGKWNKNLEEITLVKPQTTVTTQSLSKEITKSGNEKVL 
TSTNNNS SRVAKI I S PKHNGD S VNHT 

SEQ ID NO. 4421 
STRAIN CJB110 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTAKEMGDTSVKNDKTEDE 
LLEELSKNLDTSNLGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPESK 
S SLAVFDTSKITKLQAITQRGKGNVVAI I DTGFDINHDI FRLDS PKDDKHS FKTKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
NSKRPAINGLLLEGAAPNAQVLLMRIPDKIDSDKFGEAYAKAITDAVNLGAKTINMSLGK 
TADSLIALNDKVKLALKLASEKGVAVWAAGNEGAFGMDYSKPLSTNPDYGTVNSPAISE 
DTLSVASYESLKTISEWETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGWGIVIFNDQEKRGNFLIPYRELPVGVISKVDG 
ERIKNTSSQLTFNQSFEWDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGTSMASPHVAGLMTMLQNHLAEKYKGMNLDSKKLLELSKNILMSSATALYSEEDK 
AFYS PRQQGAGWDAEKAIQAQYYVTGNDGKAKINLKRVGDKFDITVT IHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFANLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLE YNE SAP FE SNN YTALLTQ S AS WGYVDYVKNGGE LE LAPE S PKRI I LGTFENKVE DKT 
IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 
YRKNFHNNPKQSDGHYRMDAFQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFPCV 
QVSTKS PNLPLLAQFDETNRTLSLAMPKESSYVPTYRLQLVLSHVVKDEEYGDETSYHYF 
HIDQEGKVTLPKTVKIGESEVAVDPKALTLWEDBCAGNFATVKLSDLLNKAWSEKENAI 
VI SNSFKYFDNLKKESMFISKEGKVVNKNLEEITLVKPQTTVTTQSLSKE ITKSGNEKVL 
T S TNNNS SRVAKI I S PKHNGDSVNHT 

SEQ ID NO. 4422 
STRAIN 1169NT 

EEQELKNQEQSPVIANVAQQPSPSVTTNIVEKTSVTAASASNTAKEMGDTSVKNDKTEDE 
LLEELSPCNLDTSNMGADLEEEYPSKPETTNNKESNWTNASTAIAQKVPSAYEEVKPKSK 
SSLAVLDTSKITKLQAITQRGKGNVVAIIDTGFDINHDIFRLDSPKDDKHSFKNKAEFEE 
LKAKHNITYGKWVNDKIVFAHNYANNTETVADIAAAMKDGYGSEAKNISHGTHVAGIFVG 
N SKRPAINGLLLEGAAPNAQVLLMRI PDKI DS DKFGEAYAKAIT DAVNLGAKT INMS I GK 
T AD S L I ALN D K VK L ALKL AS E KG V AVW AAGN E G A FGM DYSKPLSTNP D YGT VN S PA I S E 
DTLSVASYESLKTISEVVETTIEGKLVKLPIVTSKPFDKGKAYDWYANYGAKKDFEGKD 
FKGKIALIERGGGLDFMTKITHATNAGWGIVIFNDQEKRGNFLIPYRELPVGVISKVDG 
ERIKNTSSQLTFNQRFEVVDSQGGNRMLEQSSWGVTAEGAIKPDVTASGFEIYSSTYNNQ 
YQTMSGT SMAS PHVAGLMTMLQSHLAEKYKGMNLDSKKLLELSKNILMS S ATALYSEE DK 
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AFYSPRQQGAGWDAEKAIQAQYYVTGNDGKAKINLKRVGDKFDITVTIHKLVEGVKELY 
YQANVATEQVNKGKFALKPQALLDTNWQKVILRDKETQVRFTIDASQFSQKLKEQMANGY 
FLEGFVRFKEAKDSNQELMSIPFVGFNGDFASLQALETPIYKTLSKGSFYYKPNDTTHKD 
QLE YNE SAP FE SNN YT ALLTQ S AS WG YV D YVKNGGE LE LAPE S PKRI I LGT FENKVE DKT 
IHLLERDAANNPYFAISPNKDGNRDEITPQATFLRNVKDISAQVLDQNGNVIWQSKVLPS 
YRBCNFHNNPKQSDGHYRMDALQWSGLDKDGKWADGFYTYRLRYTPVAEGANSQESDFKV 
QVSTKSPNLPSRAQFDETNRTLSLAMPKGSSYVPIYRLQLVLSHWKDEEYGDETSYYYF 
HIDQEGKATLPKTVKIGESEVAVDPKALTLWEDKAGNFATVKLSDLLNKAWSEKENAI 
VISNSFKYFDNLKKEPMFISKKEKWNPCNLEEIILVKPHTTVTTQSLSKEITKSGNEKVL 
TSTNNNS SRVAKI I S PKHNGDS VNHT 

SEQ ID NO. 4501 
STRAIN 2603 

ATGAAAAAGATTAGAAAAAGTTTAGGACTTCTACTATGTTGCTTTTTAGGATTGGTACAA 
TTAGCGTTTTTTTCGGTAGCCAGTGTAAATGCTGATACCCCTAATCAACTAACAATCACA 
C AG AT AG G AC T T C AG C C AAAT AC T AC AG AG GAGG GGAT T T CT T AT C GT T T AT GGAC T GT G 
AC T GAC AAC T T AAAAGT T GAT T TAT T GAG C C AAAT GAC AGAT AG C G AAT T G AAC C AG AAG 
T AT AAGAGT AT C T T G AC T T C T C C T AC T G AT AC T AAT GGT C AG AC AAAGAT AG C AC T C C C A 
AATGGTTCGTACTTTGGTCGTGCTTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCT 
TTTTATATTGAATTACCAGATGATAAGTTATCAAATCAATTACAGATAAATCCTAAGCGA 
AAAGT T GAAAC AG G C C GAT T AAAAC T TAT T AAAT AT AC AAAAGAAGG AAAGAT AAAGAAA 
AGGCTATCCGGAGTAATATTTGTATTATACGATAACCAGAATCAGCCAGTTCGCTTTAAA 
AAT G GAC GAT T T AC GAC C GAT C AAG AT G GGAT TACT T C AT T AGT AAC T GAT GAT AAGG GA 
GAAAT T G AGGT T GAAG G T T TAT T AC C T G G T AAGT AT AT T TT T C GAG AAG C AAAAGCAC T A 
AC T GGT T AC C GT AT AT C TAT G AAG GAT G CT GT AGT T GC T GT AGT T G C T AAT AAAACAC AG 
GAAGTAGAGG TAG AAAAC G AAAAAG AAAC T C C T CC AC C AAC AAAT C C T AAAC CAT C AC AA 
CCGCTTTTTCCACAATCATTTCTTCCTAAAACAGGAATGATTATTGGTGGAGGACTGACA 
ATTCTTGGTTGTATTATTTTGGGAATTTTGTTTATCTTTTTAAGAAAAACTAAAAATAGC 
AAAT CT GAAAGAAACGATACAGT A 

SEQ ID NO. 4502 
STRAIN 090 

GATACCCCTAATCAACTAACAATCACAC 

AG AT AG GAC T T C AG C C AAAT AC TAG AG AG GAG GG GAT T T C TT AT C G T T T A 
T GG ACT GT GAC T GAC AAC T T AAAAGT T GAT T T AT T GAG C C AAAT GAC AGA 
TAG C GAAT T GAAC C AG AAGT AT AAGAG TAT C T T GAC T T C T C C TACT GAT A 
CT AAT GG t C AG AC AAAGAT AG C ACT C C C AAAT GGTTCGTACTTTGGTCGT 
G C T T AT AAAGC T GAT C AAAG C GT T T C AAC AAT AGT AC CT T T T TAT AT T GA 
AT T AC C AGAT GAT AAG T TAT C AAAT C AAT T AC AG AT AAAT C C T AAG C G AA 
AAGTTGAAACAGGCCGATTAAAACTTATTAAATATACAAAAGAAGGAAAG 
ATAAAGAAAAGGCTATCAGGAGTAATATTTGTATTATACGATAACCAGAA 
T C AG CC AGT T C G CT T T AAAAAT G G ACG AT T TAG G AC CGAT C AAG AT G GG A 
T T AC T T CAT T AGT AAC T GAT G AT AAGGG AG AAAT T G AGG T T G AAGG T T T A 
T T AC C T G GT AAG T AT AT T T T T C G AGAAG C AAAAG C AC T AAC TGGtTACCG 
TAT AT C TAT G AAGGAT G C T G T AGT T G CT GT AG T T G CT AAT AAAAC AC AG G 

AAGTaGAGGTaGAAAACGAAAAAGAAACTCCTCCACCAACAAATCCTAAA 
CCATCACAACCG 

SEQ ID NO. 4503 
STRAIN H36B 

GAT AC C CCTAAT C AACT AACAATC ACAC AGA 

T AGG AC T T C AG C C AAAT ACT AC AG AGG AGG G GAT T T C T TAT CGT T T AT GG 
AC T GT G AC T GAC AAC T T AAAAGT T GAT T T AT T G AG C C AAAT GAC AG AT AG 
C GAAT T GAAC C AG AAG TAT AAG AG TAT C T T GAC T T C T C C T AC T GAT AC T A 
AT GG t C AG AC AAAG AT AG C ACT C C C AAAT GG T T C GT AC TTTGGTCGTGCT 
TATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGAATT 
AC CAGATGAT AAGT TAT CAAAT CAAT TACAGAT AAAT CCT AAG CG AAAAG 
T T GAAAC AG G C CG AT T AAAAC T T AT T AAAT AT AC AAAAG AAG G AAAG AT A 
AAGAAAAGGCT wT CCGGAGT AAT ATTT GT AT TAT AC GAT AAC C AG AAT CA 
G C C AGT T C G C T T T AAAAAT GGAC GAT T T AC GAC C GAT C AAG AT GG G AT T A 
C T T CAT T AGT AAC T GAT GAT AAGGG AG AAAT T GAGGT T GAAG G T T TAT T A 
CCT G GT AAGT AT AT T T T T C GAG AAG C AAAAG C ACT AACT GGT TAG C GT AT 
AT CT AT GAAG GAT G C T G T AGT T G C T G T AG T T G C T AAT AAAAC AC AGG AAG 
TAGAGGTAGAAAACGAAAAAGAAACTCCTCCACCAACAAATCCTAAACCA 
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TCACAACCGC 

SEQ ID NO. 4504 
STRAIN 18RS21 

GATACCCCT AAT CAACTAACAAT CACACAG 

AT AGGACT T C AGCC AAAT ACT AC AG AG GAGGGG AT T T CT T AT C GT T TAT G 
GACTGTGACTGACAACTTAAAAGTTGATTTATTGAGCCAAATGACAGATA 
GCGAATTGAACCAGAAGTATAAGAGTATCTTGACTTCTCCTACTGATACT 
AAT GG t C AG AC AAAG AT AG C ACT C C CAAAT GGT T C GT ACT T T GGT C GT G C 
TTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGAAT 
T AC C AGAT GAT AAGT T AT CAAAT C AAT T AC AG AT AAAT CCT AAG C GAAAA 
GTTGAAACAGGCCGATTAAAACTTATTAAATATACAAAAGAAGGAAAGAT 
AAAGAAAAG G CT AT C C G G AGT AAT AT T T GT AT T AT ACGAT AAC C AG AAT C 
AG C C AGT T C GCT T T AAAAAT GGAC GAT T T ACG AC C GAT C AAG AT GG GAT T 
ACTTCATTAGTAACTGATGATAAGGGAGAAATTGAGGTTGAAGGTTTATT 
AC CT G GT AAGT AT AT T T T T C GAG AAG C AAAAG C AC T AACT G G T T AC CG T A 
TATCTATGAAGGATGCTGTAGTTGCTGTAGTTGCTAATAAAACACAGGAA 
GTAGAGGTAGAAAACGAAAAAGAAACTCCTCCACCAACAAATCCTAAACC 
AT C AC AAC C 

SEQ ID NO. 4505 
STRAIN CJB110 

GATACCCCTAATCAACTAACAATCACACA 

GATAGGACTTCAGCCAAATACTACAGAGGAGGGGATTTCTTATCGTTTAT 
GGaCTGTGACTGACAACTTAAAAGTTGATTTATTGAGCCAAATGACAGAT 
AGCGAATTgAACCAGAAGTATAAGAGTATCTTGACTTCTCctACTGATAc 
TAATGGTCAGACAAAGATAGCACTCCCAAATGGTTcGTACTTTGGTCGTG 
CTTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGAA 
TTACCAGATGATAAGTTATCAAATCAATTACAGatAAATCCTAAGCGAAA 
AGT T G AAAC AG G C CGAT T a a AACT T AT T AAAT AT ACAAAAG AAG G AAAG A 
TAAAGAAAAGGCTaTCAGGAGTAATATTTGTATTATACGATAACCAGAAT 
CAGCCAGTTCGCTTTAAAAATGGACGATTTACGACCGATCAAGATGGGAT 
TACTTCATTAGTAACTGATGATAAGGGAGAAATTGAGGTTGAAGGTTTAT 
TACCTGGTAAGTATATTTTTCGAGAAGCAAAAGCACTAACTGGTTaCCGT 
ATATCTATGAAGGATGCTGTAGTTGCTGTAGTTGCTAATAAAACACAGGA 
AGTAGAGGTAGAAAACGAAAAAGAAACTCCTCCACCAACAAATCCTAAAC 
CATCACAACC 

SEQ ID NO. 4506 
STRAIN 1169NT 

GAT AC C C CT AAT CAACTAACAAT CACACAG 

ATAGGACTTCAGCCAAATACTACAGAGGAGGGGATTTCTTATCGTTTATG 
G ACT GT GACT G AC AAC T T AAAAGT T GAT T T AT T GAG C CAAAT G AC AG AT A 
GCGAATTGAACCAGAAGTATAAGAGTATCTTGACTTCT.CCTACTGATACT 
AAT GG t C Ag a C AAAG AT AG C ACT C C CAAAT GGT T C GT AC T T T GGT C G T G C 
TTATAAAGCTGATCAAAGCGTTTCAACAATAGTACCTTTTTATATTGAAT 
T AC C AG AT GAT AAGT TAT CAAAT C AAT T AC AG AT AAAT C CT AAG C GAAAA 
GT T GAAACAGGCCGAT T AAAACTT AT T AAAT AT AC AAAAG AAG G AAAG AT 
AAAGAAAAGGCTATCAGGAGTAATATTTGTATTATACGATAACCAGAATC 
AG C C AGT T C G CT T T AAAAAT GGAC GAT T T AC G AC C GAT C AAG AT GG GAT T 
AC T T CAT T AGT AAC t ga T GAT AAG G GAG AAAT T G AGGT T G AAG GT T T AT T 
ACCTGGTAAGTATATTTTTCGAGAAGCAAAAGCACTAACTGGTTACCGTA 
TATCTATGAAGGATGCTGTAGTTGCTGTAGTTGCTAATAAAACACAGGAA 
GT AG AG GT AG AAAAC G AAAAAGAAAC T C CT C C AC C AAC AAAT CCT AAAC C 
ATCACAACC 

SEQ ID NO. 4507 
STRAIN 2603 

MKKIRKSLGLLLCCFLGLVQLAFFSVASVNADTPNQLTITQIGLQPNTTEEGISYRLWTV 
TDNLKVDLLSQMTDSELNQKYKSILTSPTDTNGQTKIALPNGSYFGRAYKADQSVSTIVP 
FYIELPDDKLSNQLQINPKRKVETGRLKLIKYTKEGKIKKRLSGVIFVLYDNQNQPVRFK 
NGRFTTDQDGITSLVTDDKGEIEVEGLLPGKYIFREAKALTGYRISMKDAVVAVVANKTQ 
EVEVENEKETPPPTNPKPSQPLFPQSFLPKTGMIIGGGLTILGCIILGILFIFLRKTKNS 
KSERNDTV 
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SEQ ID NO. 4508 
STRAIN 090 

DTPNQLTITQIGLQPNTTEEGISYRLWTVTDNLPCVDLLSQMTDSELNQKYKSILTSPTDT 
NGQTKIALPNGSYFGRAYKADQSVSTIVPFYIELPDDKLSNQLQINPKRKVETGRLKLIK 
YTKEGKIKKRLSGVIFVLYDNQNQPVRFKNGRFTTDQDGITSLVTDDKGEIEVEGLLPGK 
YI FREAKALTGYR I SMKD AWAWANKTQE VE VENEKE T P P PTN PKP S Q P 

SEQ ID NO. 4509 
STRAIN H36B 

DTPNQLTITQIGLQPNTTEEGISYRLWTVTDNLKVDLLSQMTDSELNQKYKSILTSPTDT 
NGQTKIALPNGSYFGRAYKADQSVSTIVPFYIELPDDKLSNQLQINPKRKVETGRLKLIK 
YTKEGKIKKRLSGVIFVLYDNQNQPVRFKNGRFTTDQDGITSLVTDDKGEIEVEGLLPGK 
YI FREAKALTGYR I SMKDAWAWANKTQEVE VENEKE T P P PTN PKP S QP 

SEQ ID NO. 4510 
STRAIN 18RS21 

DTPNQLTITQIGLQPNTTEEGISYRLWTVTDNLKVDLLSQMTDSELNQKYKSILTSPTDT 
NGQTKIALPNGSYFGRAYKADQSVSTIVPFYIELPDDKLSNQLQINPKRKVETGRLKLIK 
YTKEGKIKKRLSGVIFVLYDNQNQPVRFKNGRFTTDQDGITSLVTDDKGEIEVEGLLPGK 
YIFREAKALTGYRISMKDAWAWANKTQEVEVENEKETPPPTNPKPSQ 

SEQ ID NO. 4511 
STRAIN 1169NT 

DTPNQLTITQIGLQPNTTEEGISYRLWTVTDNLKVDLLSQMTDSELNQKYKSILTSPTDT 
NGQTKIALPNGSYFGRAYKADQSVSTIVPFYIELPDDKLSNQLQINPKRKVETGRLKLIK 
YTKEGKIKKRLSGVIFVLYDNQNQPVRFKNGRFTTDQDGITSLVTDDKGEIEVEGLLPGK 
YI FREAKALTGYRI SMKDAVVAWANKTQEVEVENEKETPPPTNPKPSQ 

SEQ ID NO. 4601 
STRAIN A909 

T GAC AAAT AT TAT T T TAG C C AACGT G GT T TAG AG C AAG C AGG T GT AACT AT AT T AC C TTT 
C T C AC C G AAT AAT AT C AGT G AG GAT T TAG AG AT TAT T G C AG G AAAT GCTTTTCGTC C AG A 
T AAC AAT G AAGAGT T GG CT T AT GT TAT T GAAAAGGG C T AT CAT T T T AAACGAT AT CAT G A 
AT T T C T CGG AG AT T T TAT G C GT C AG T T C AC T AGT C T AG G T G TAG CT G G GGC AC AT G G AAA 
AACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAATATTACAGACACTTCTTTCCT 
AAT T GG AG AT GGT AC AGGAC GT GGT T C T G C T AAT G C T AAT T AC TTTGTGTTT G AAGCT GA 
T GAAT AC GAAC GT CAT T T TAT G C C GT AC CAT C C AG AAT AC T C AAT TAT T AC C AAT AT T G A 
TTTTGACCATCCTGATTATTTTACAGGCCTAGAGGACGTATTCAATGCCTTTAATGACTA 
T GC T AAG CAAGT T C AAAAAGGT T T AT T CAT T T AT GG AG AAG AT C C AAAAC T T CAT G AAAT 
C AC T T C T GAG G C AC C AAT AT AT TAT TAT G GT T T T G AAG AT T C AAAT GAT T T TAT AG C AAA 
AG AC AT C ACT C GAAC T GT T AAT GGT T C T GAC TT T AAGGT T T T C TAT AAC C AAG AAG AAAT 
TGGTCAGTTTCATGTACCAGCATACGGTAAACATAATATCTTAAATGCAACTGCTGTTAT 
TGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAGCTGAGCATTTGAAGACATT 
TTCAGGGGTAAAGCGTCGTTTTACTGAGAAGATTATTGACGATACTGTCATTATTGATGA 
CTTTGCTCACCATCCTACTGAGATTATTGCGACATTAGATGCTGCTCGACAAAAATACCC 
GTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTCACTCGTACGATAGCTCTTTT 
AGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTTATCTCGCTCAAATATATGG 
TTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGAAGATTTAGCTGCTAAGATTGT 
C AAAC AC T C AG AT T TAG T GAC AGT C G AAAAT GT CTCGCCTT T AC T C AAT CAT GAT AAT G C 
T GT CT AT GT CT T TAT G GG T GC T GG AG AC AT T C AAT T G TAT GAG CGCTCTTTT GAAG AAT T 
AT TAG C T AAC C T AAC T AAAAAT AC AC AA 

SEQ ID NO. 4602 
STRAIN 1169NT 

AAAAG C AGG CT C T AGT G AC G T T GAC AAAT AT TAT T T T ACC C AAC GT GGT T TAG AG C AAG C 
AGGT G T AAC TAT AT T AC C T T T C T C AC C GAAT AAT AT C AGT G AGGAT T TAG AG AT TAT T G C 
AGGAAATGCTTTTCGTCCAGATAACAATGAAGAGTTGGCTTATGTTATTGAAAAGGGCTA 
TCATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGG 
TGTAGCTGGGGCACATGGAAAAACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAA 
TATTACAGACACTTCTTTCCTAATTGGAGATGGTACAGGACGTGGTTCTGCTAATGCTAA 
T T AC TTTGTGTTT GAAG C T GAT GAAT AC G AACGT CAT T T T AT G C C G T AC CAT C C AG AAT A 
CT C AAT T AT T AC C AAT AT T G AT TTT GAC CAT C CT GAT TAT T T T AC AG G C C TAG AG G AC GT 
AT T C AAT G C C T T T AAT GAC TAT G CT AAG C AAGT T C AAAAAGGT T TAT T CAT T TAT GG AGA 
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AG AT C C AAAAC T T CAT G AAAT C ACT T C T G AGG C AC C AAT AT AT TAT T AT GGT T T T GAAG A 
T T C AAAT GAT T T T AT AGC AAAAGAC AT C AC T C GAACT GT T AAT GGT T CT GACT T T AAG G T 
TTTCTATAACCAAGAAGAAATTGGTCAGTTTCATGTACCAGCATACGGTAAACATAATAT 
CTTAAATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGT 
AGC T GAG CAT T T GAAG AC AT T T T C AG GGGT AAAG C G T C GT T T T ACT GAGAAGAT TAT T G A 
C GAT ACT GT CAT TAT T GAT G ACT T T GCT C AC CAT C CT AC T GAGAT T AT T G C G AC AT TAG A 
TGCTGCTC G AC AAAAAT AC C CGT C AAAAG AAAT T GT AG CT AT T T T C C AAC C GC AT ACGT T 
CACTCGTACGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGT 
T TAT C T C G C T C AAAT AT AT GGTTCTGC T AGAGAAG T AGAT AAT GGT GAG GT G AAGGT AG A 
AG AT T T AG C T G C T AAG AT T GT C AAAC AC T C AG AT T T AGT G AC AG T C G AAAAT GT C T CG C C 
TTTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTA 
T GAG CGCTCTTTT GAAG AAT TAT TAG C T AAC C T AAC T AAAAAT AC AC AA 

SEQ ID NO. 4603 
STRAIN 090 

AAAGCAGGCTCTAGTGACGTTGACAAATATTATTTTACCCAACGTGGTTTAGAGCAAGCA 
GGTGTAACTATATTACCTTTCTCACCGAATAATATCAGTGAGGATTTAGAGATTATTGCA 
G G AAAT GCTTTTCGTC C AG AT AAC AAT GAAG AGT T GG C T TAT GT TAT T GAAAAG GG C T AT 
CATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGT 
GTAGCTGGGGCACATGGAAAAACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAAT 
AT T AC AG AC AC TTCTTTCC T AAT T G GAGAT G GT AC AG GAC GT GGT T C T G CT AAT GC T AAT 
TACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATAC 
T C AAT TAT TAG C AAT AT T GAT T T T GAC CAT C CT GATT ATT T T AC AGG C C T AG AGGAC G T A 
T T C AAT GCT T T T AAT GAC TAT G C T AAG C AAGT T C AAAAAGGT T T AT T CAT T T AT GG AG AA 
GAT T C AAAAC T T CAT G AAAT C AC T T C T AAGG C AC C AAT AT AT TAT T ATGG TT T T GAAG AT 
T C AAATGATTTT AT AGC AAAAGAC AT C ACT CGAACTGTT AAT GGT TCT GAC TTTAAGGTT 
T T C T AT AACC AAG AAG AAAT T G GT C AG T T T CAT GT AC C AG C AT ACGGT AAAC AT AAT AT C 
T T AAAT G C AAC T GC T GT T AT T G CT AAC C T T T AC AT AAT GG GAAT T GAT AT GG CAT T AGT A 
GCT GAG CAT T T GAAG AC AT T T T CAGGG GT AAAAC GT C G T T T T AC T GAGAAGAT TAT T GAC 
GATACTGTCATTATTGATGACTTTGCTCACCATCCTACTGAGATTATTGCGACATTAGAT 
GCTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTC 
ACTCGTACGATAGCTCTTTTAGACGATTTTGCCCATGCTTTGAGTCAAGCGGATAGCGTT 
TAT C T T G CT C AAAT AT AT GG T T C T G CT AG AG AAGT AG AT AAT GGT GAG G T G AAGG T AGAA 
GAT T T AG CT G C T AAG AT T GT C AAAC AC T C AG AT T T AGT GAC AGT CG AAAAT GTCTCGCCT 
TTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTAT 
GAG CGCTCTTTT GAAG AAT TAT TAG CT AAC C TAACT AAAAAT AC AC AA 

SEQ ID NO. 4604 
STRAIN H36B 

AAAAGCAGGCTCTAGTgACGTTgACAAATATtATTTTACTCAACGTGGTTtAGAGCAAGCAGGT 
AT AACT AT AT T AC C T T T C T C AC CG AAT AAT AT C AGT G AGG AT T TAG AG AT TAT T G C AGG A 
AATGCTTTTCGTCCAGATAACAATGAAGAGTTGGCTTATGTTATTGAAAAGGGCTATCAT 
TTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGTGTA 
GCTGGGGCACATGGAAAAACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAATATT 
AC AG AC AC TTCTTTCC T AAT T G GAGAT G GT AC AGGACGT GG T T CT G C T AAT G C T AAT T AC 
TTTGTGTTT GAAG C T GAT GAAT AC G AAC GT CAT T T TAT G CC G T AC CAT C C AG AAT ACT C A 
AT TAT T AC C AAT AT T GAT T T T GAC CAT C C T GAT TAT T T T AC AG G C CT AG AG G AC GT AT T C 
AAT G CT T TT AAT GAC TAT G C T AAG C AAGT T C AAAAAGGT T TAT T CAT T TAT G GAGAAGAT 
C C AAAAC T T CAT GAAAT C ACT T CT G AG GC AC C AAT AT AT TAT TAT GGT T T T GAAG AT T C A 
AAT GATTT TAT AGC AAAAG AT AT C ACT CGAACTGTT AAT GGTTCTGACTTTAAGGTTTTC 
TAT AAC C AAG AAGAAAT T GGT C AG T T T C AC GT AC C AG CAT AC G G T AAAC AT AAT AT CT T A 
AATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAGCT 
GAG CAT T T GAAG AC AT T T T CAGGG GT AAAAC GT C G T T T T AC T GAG AAAAT TAT T G ACG AT 
AC T G T CAT TAT T GAT GAC TTTGCTCAC CAT C C T AC T GAGAT TAT T G C GAC AT TAG AT GCT 
GCT CG AC AAAAAT AC C CGT C AAAAG AAAT TGTAGCT AT TTTCCAACCGC AT ACGT TC ACT 
CGTACGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTTAT 
C T C G CT C AAAT AT AT GGTTCTGC TAG AG AAG TAG AT AAT GGT GAG G T G AAG GT AG AAG AT 
T T AG CT G C T AAG AT T GT C AAAC AC T C AG AT T TAG T GAC AGT CG AAAAT GTCTCGCCTTTA 
CTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTATGAG 
CGCTCTTTT GAAG AAT TAT TAG C T AAC C TAACT AAAAAT AC AC AA 

SEQ ID NO. 4605 
STRAIN 18RS21 

AAAGCAGGCTCTAGTGACGTTGACAAATATTATTTTACCCAACGTGGTTTAGAGCAAGCA 
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GGT GT AAC T AT AT T AC C T T T C T C AC C GAAT AAT AT C AGT GAG GAT T T AGAG AT TAT T GC A 
G GAAAT GCTTTTCGTC CAG AT AAC AAT G AAG AGT T GG CT T AT GT T AT T G AAAAGG G CT AT 
CAT T T T AAACG AT AT CAT GAAT T T C T C G GAGAT T T TAT G CGT C AGT T C AC T AGT C T AG GT 
GT AG C T GGGG C AC AT G G AAAAAC C T C AACG AC AGGT T T AT T AGC T C AT GT T T T AAAAAAT 
AT T AC AG AC ACT T CT T T C C T AAT T G GAGAT GGT AC AG GACGT GGT T C T G CT AAT G CT AAT 
TACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATAC 
TCAATTATTACCAATATTGATTTTGACCATCCTGATTATTTTACAGGCTTAGAGGACGTA 
TTCAATGCCTTTAATGACTATGCTAAGCAAGTTCAAAAAGGTTTATTCATTTATGGAGAA 
GAT C C AAAAC T T CAT GAAAT C AC T T C T G AG GC AC C AAT AT AT TAT TAT G GT T T T G AAG AT 
T C AAAT GAT T T TAT AG C AAAAG AC AT C AC T C G AAC T GT T AAT GGT T CT G ACT T TAAGGT T 
T T C TAT AAC C AAGAAG AAAT T GG T CAG T T T C AT GT AC C AGC AT ACGGT AAAC AT AAT AT C 
T T AAAT G C AAC T GC T GT TAT T G C T AAC C T T T AC AT AAT G GG AAT T GAT AT GG C AT T AGT A 
GCTGAGCATTTGAAGACGTTTTCAGGGGTAAAGCGTCGTTTTACTGAGAAGATTATTGAC 
GAT AC T G T CAT TAT T GAT G ACT T T G C T C AC CAT C C TACT GAGAT TAT T G C G AC AT TAG AT 

GCTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTC 
AC T C GT ACG AT AGC T CT T T T AG ACG AAT T T G C C C AT G C C T T G AGT C AAG CGG AT AG C GT T 

TATCTCGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGAA 
GAT T T AG CT G CT AAG AT T GT C AAAC ACT CAG AT T T AGT G AC AGT CG AAAAT GTCTCGCCT 
T T ACT C AAT CAT GAT AAT G CT GT C T AT GT C T T TAT GGGTGCTG GAG AC AT T C AAT T GT AT 
GAG CGCTCTTTT GAAG AAT T AT T AG C T AAC C T AACT AAAAAT AC AC AA 

SEQ ID NO. 4606 
STRAIN M732 

AAAAGCAGGCTCTAGTGACGTtGACAAATAtTATTTTACCCAACGTGGTTTAGAGCAAGCAG 
GT G T AACT AT AT T AC CT T T C T C AC CGAAT AAT AT C AGT G AGGAT T TAG AG AT T AT T G CAG 

GAAATGCTTTTCGTCCAGATAACAATGAAGAGTTGGCTTATGTTATTGAAAAGGGCTATC 
ATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGTG 
TAGCTGGGGCACATGGAAAAACCTCAACGACAGGTTTATTAGCTCATGTTTTAAAAAATA 
TTACAGACACTTCTTTCCTAATTGGAGATGGTACAGGACGTGGTTCTGCTAATGCTAATT 
ACT T T GT GT T T GAAG C T GAT GAAT AC G AACG T CAT T T TAT G C C GT AC CAT C CAG AAT AC T 
C AAT TAT TAG C AAT AT T GAT T T T G AC CAT C CT G AT TAT T T T AC AGG C C TAG AG GAC GT AT 
T C AAT GC CT T T AAT GAC T AT G C T AAG C AAG T T C AAAAAGGT T TAT T CAT T T AT GGAG AAG 
AT C C AAAAC T T CAT GAAAT C AC T T C T G AGGC ACC AAT AT AT TAT TAT GGT T T T GAAG AT T 
C AAAT GAT T T TAT AG C AAAAGAC AT C AC T C GAAC T GT T AAT G GT T C T GAC T T T AAGG T T T 
T C T AT AAC C AAG AAGAAAT T G G T C AGT T T CAT GT AC CAG CAT AC G GT AAAC AT AAT AT C T 
TAAATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAG 
C T GAG CAT T T GAAGAC AT T T T C AG GGGT AAAG CGT C GT T T T AC T GAG AAG AT TAT T GAC G 
AT AC T GT CAT TAT T GAT G ACT TT G C T C AC CAT C C T ACT GAGAT TAT T GC GAC AT TAG AT G 
CTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTCA 
CTCGTACGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTT 
ATCTCGCT C AAAT AT AT G G T T C T G CT AG AGAAGT AGAT AAT GGT GAGGT G AAg GT AG AAG 
AT T T AGC T G CT AAg AT T GT C AAAC AC T CAG AT T T AGT GAC AGT C G AAAAT GT CTCGCCTT 
TACT C AAT CAT GAT AAT G C T GT C TAT GT CT T TAT GGGT G C T GGAG AC AT T C AAT T GT AT G 
AG CGCTCTTTT G AAGAAT T AT T AG C T AAC C T AAC T AAAAAT AC AC AA 

SEQ ID NO. 4607 
STRAIN M781 

AAAG CAG G C T CT AG T G AC GT t GAC AAAT AT T AT T T T AC C C AAC GT GGT T T AGAG C AAG CAG 
GTGTAACTATATTACCTTTCTCACCGAATAATATCAGTGAGGATTTAGAGATTATTGCAG 
GAAAT GCTTTTCGTC CAG AT AAC AAT GAAG AGT T GG C T T AT GT T AT T G AAAAG GG C TAT C 

ATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGT 
GT AG C T G G G G C AC AT G G AAAAAC C T C AAC GAC AGGT T TAT TAG CT CAT GT T T T AAAAAA 
TAT T AC AG AC AC T T CT T T C C T AAT T GGAG AT G GT AC AG GACGT G G T T CT G C T AAT G C T AA 
TTACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATA 
C T C AAT TAT T AC C AAT AT T GAT T T T GAC CAT CC T GAT TAT T T T AC AGG C C T AGAG G ACG T 

ATTCAATGCCTTTAATGACTATGCTAAGCAAGTTCAAAAAGGTTTATTCATTTATGGAGA 
AGAT C C AAAAC T T CAT GAAAT C AC T T C T G AG G C ACC AAT AT AT TAT TAT GGT T T T GAAG A 
T T C AAAT GAT T T TAT AG C AAAAG AC AT C AC T CG AAC T GT T AAT GG T T C T G ACT T T AAG G T 
T T T C TAT AAC C AAG AAG AAAT T G GT C AGT T T CAT GT AC CAG CAT AC GGT AAAC AT AAT AT 
CTTAAATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGT 
AG CT G AG CAT T T GAAG AC AT T T T C AG GG GT AAAG CGTCGTTT T AC T GAG AAG AT TAT T GA 
C GAT AC T G T CAT TAT T GAT GAC T T T G C T C AC CAT C C T AC T GAGAT TAT T G C GAC AT TAG A 
TGCTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTT 
C AC T CGT AC GAT AGC T C T T T TAG AC GAAT T T G C C CAT G C C T T GAGT C AAG C GG AT AG CGT 
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TTATCTCGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGA 
AGAT T TAG C T G CT AAG AT T GT CAAAC ACT C AGAT T T AGT GAC AGT C G AAAAT GT C T C GC C 
TTTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTA 
TGAGCGCTCTTTTGAAGAATTATTAGCTAACCTAACTAAAAATACACAA 

SEQ ID NO. 4608 
STRAIN CJB110 

AAAAAGCAGGCTCTAGTGACGTtGACAAATAtTATTTTACCCAACGTGGTTTAGAGCAAGCA 
GGTGTAACTATATTACCTTTCTCACCGAATAATATCAGTGAGGATTTAGAGATTATTGCA 
GG AAAT GCTTT T CGT CC AGAT AACAATGAAG AGT TGGCTT AT GTT AT TGAAAAGGGCT AT 
CAT T T T AAAC G AT AT CAT G AAT T T C T CGG AGAT T T TAT G C G T C AGT T C AC TAG T C T AG G T 
GT AG CT GGG GC AC AT G G AAAAAC CT C AAC G AC AGGT T TAT TAG C T C AT GT T T T AAAAAAT 
AT T AC AG AC AC TTCTTTCC T AAT T G G AGAT GG T AC AGGACGT GG T T C T GC T AAT G C T AAT 
TACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCCAGAATAC 
T C AAT TAT T AC C AAT AT T GAT T T T GAC CAT C C T GAT TAT T T TAG AG G CC TAG AGGACGT A 
T T C AAT GCTTT T AAT G ACT AT G C T AAG C AAGT T C AAAAAGGT T TAT T CAT T T AT GG AGAA 
GAT T C AAAACT T CAT G AAAT C AC T T C T AAG G C AC C AAT AT AT TAT TAT G G T T T T GAAGAT 
T C AAAT GAT T T T AT AGC AAAAG AC AT C ACT CG AAC T GT T AAT G G TT CT GAC T T T AAGGT T 
T T CT AT AAC C AAG AAG AAAT T GGT C AGT T T CAT G T AC C AG C AT ACGGT AAAC AT AAT AT C 
TTAAATGCAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTA 
G C T GAG CAT T T G AAGAC AT T T T C AGG G G T AAAAC G T C GT T T TACT GAG AAG AT TAT T GAC 
GATACTGTCATTATTGATGACTTTGCTCACCATCCTACTGAGATTATTGCGACATTAGAT 
GCTGCTCGACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTC 
ACT CGT AC GAT AG C T C T T T TAG AC GAT T T T GC C CAT GCTTT GAGT C AAG CGGAT AG CGT T 
TATCTTGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGAA 
GAT T TAG C T G C T AAG AT T G T CAAAC ACT C AGAT T T AGT GAC AG T C G AAAAT GTCTCGCCT 
TTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTAT 
GAG CGCTCTTTT G AAG AAT TAT TAG C T AAC C T AAC T AAAAAT AC AC AA 

SEQ ID NO. 4609 

STRAIN JM9130013 (reverse complement) 

GTT C AAAAAAG C AGG C T C T AGT GACG T T GAC AAAT AT TAT T T T AC T C AAC GTG GT T TAG A 
GCAAGCAGGTATAACTATATTACCTTTCTCACCGAATAATATCAGTGAGGATTTAGAGAT 
TATTGCAGGAAATGCTTTTCGTCCAGATAACAATGAAGAGTTGGCTTATGTTATTGAAAA 
GGGCTATCATTTTAAACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAG 
T C T AGGT GT AG C T GGG G C AC AT G G AAAAAC C T C AAC G AC AG GT T T AT T AG CT C AT GT T T T 
AAAAAAT AT T AC AG AC AC TTCTTTCC T AAT T G G AGAT G GT AC AG G AC GT GGT T C T G C T AA 
TGCTAATTACTTTGTGTTTGAAGCTGATGAATACGAACGTCATTTTATGCCGTACCATCC 
AGAAT ACT C AAT TAT T AC C AAT AT T GAT T T T GAC CAT C CT G AT TAT T T T AC AG G C CT AG A 
GG ACGT AT T C AAT G C T T T T AAT GAC T AT GCT AAG C AAG T T C AAAAAGGT T TAT T CAT T T A 
T G GAGAAGAT C C AAAAC T T CAT G AAAT C ACT T C T GAG G C AC C AAT AT AT TAT TAT G GT T T 
TGAAGATTCAAATGATTTTATAGCAAAAGATATCACTCGAACTGTTAATGGTTCTGACTT 
T AAGGT T T T C TAT AAC C AAG AAG AAAT T GG T C AGT T T C AC G T AC C AGC AT AC G GT AAAC A 
T AAT AT CT T AAAT GC AACT GCT GT TAT T G CT AAC C T T T AC AT AAT G GGAAT T GAT AT G G C 
ATT AGT AGCTGAGC ATT TG AAGAC AT TTTCAGGGGT AAAAC GTCGTTTT ACT GAGAAAAT 
TATTGACGATACTGTCATTATTGATGACTTTGCTCACCATCCTACTGAGATTATTGCGAC 
AT T AGAT G C T G CT C GAC AAAAAT AC C C GT C AAAAG AAAT T G TAG CT AT T TT C C AAC CG C A 
T AC GT T C AC T C GT ACG AT AGC T CT T T T AGAC G AAT T T G C C CAT GC C T T GAGT C AAGCG G A 
TAGCGTTTATCTCGCTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAA 
GGTAGAAGATTTAGCTGCTAAGATTGTCAAACACTCAGATTTAGTGACAGTCGAAAATGT 
CTCGCCTTTACTCAATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCA 
AT T GT AT G AG CG C T CT T T T G AAG AAT TAT T AG CT AAC CT AAC T AAAAAT AC AC AA 

SEQ ID NO. 4610 

STRAIN COH1 reverse complement 

CAGGCTCTAGTGACGTGACAAATATtATTTTACCCAACGTGGTTAGAGCAAGCAGGTGTAA 
C T AT AT T AC C T T T C T C AC C G AAT AAT AT C AGT G AGG AT T T AGAGAT TAT T G C AG G AAAT G 
CTTTTCGTC C AG AT AAC AAT G AAG AG T T G G C T T AT GT TAT T G AAAAG GG CT AT CAT T T T A 
AAC GAT AT CAT G AAT T T CT C GG AG AT T T TAT G C GT C AGT T C AC T AGT C T AG GTG T AG CT G 
GGGC AC AT GG AAAAAC CTCAACG AC AGGTTT ATT AGCTCATGTTTT AAAAAAT ATT AC AG 
ACACTTCTTTCCTAATTGGAGATGGTACAGGACGTGGTTCTGCTAATGCTAATTACTTTG 
T G T T T G AAG C T GAT G AAT AC G AAC GT CAT T T TAT G C C G T AC C AT C C AG AAT ACT C AAT T A 
T T AC C AAT AT T GAT T T T GAC CAT C C T GAT TAT T T T AC AGG C CT AG AG GAC G TAT T C AAT G 
C C T T T AAT GAC TAT G C T AAG C AAGT T C AAAAAG GT T TAT T CAT T TAT GG AG AAG AT C C AA 
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AAC T T CAT GAAAT C AC TT C T G AGG C AC C AAT AT AT TAT TAT GGT T T T G AAG AT T C AAAT G 
ATTTTATAGCAAAAGACATCACTCGAACTGTTAATGGTTCTGACTTTAAGGTTTTCTATA 
AC C AAGAAGAAAT T G GT C AGT T T C AT GT AC C AG C AT AC GGT AAAC AT AAT AT CT T AAATG 
CAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAGCTGAGC 
ATTTGAAGACATTTTCAGGGGTAAAGCGTCGTTTTACTGAGAAGATTATTGACGATACTG 
T CAT TAT T GAT G ACT T T G C T C AC CAT C CTAC T G AGAT TAT T G C G AC AT T AG AT GCT G C T C 
G AC AAAAAT AC C C GT C AAAAG AAAT T GT AG CT AT T T T C C AAC CG C AT ACGT T C AC T C G T A 
CGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTTATCTCG 
CTCAAATATATGGTTCTGCTAGAGAAGTAGATAATGGTGAGGTGAAGGTAGAAGATTTAG 
CTGCTAAGATTGTCAAACACTCAGATTTAGTGACAGTCGAAAATGTCTCGCCTTTACTCA 
AT CAT GAT AAT G CT GT C T AT GT CT T T AT G GGT GCT GG AG AC AT T C AAT T GT AT GAG C G C T 
C T T T T GAAG AAT TAT T AGC T AACC TAACT AAAAAT AC AC AA 

SEQ ID NO. 4611 
STRAIN 2603 

atgtcaaaaacttatcattttattggtattaaaggatccggaatgagtgccctagcactg 
atgcttcatcaaatgggacataacgtccaaggaagtgacgttgacaaatattattttacc 
caacgtggtttagagcaagcaggtgtaactatattacctttctcaccgaataatatcagt 
gaggatttagagattattgcaggaaatgcttttcgtccagataacaatgaagagttggct 
tatgttattgaaaagggctatcaatttaaacgatatcatgaatttctcggagattttatg 
cgtcagttcactagtctaggtgtagctggggcacatggaaaaacctcaacgacaggttta 
ttagctcatgttttaaaaaatattacagacacttctttcctaattggagatggtacagga 
cgtggttctgctaatgctaattactttgtgtttgaagctgatgaatacgaacgtcatttt 
atgccgtaccatccagaatactcaattattaccaatattgattttgaccatcctgattat 
tttacaggcttagaggacgtattcaatgcctttaatgactatgctaagcaagttcaaaaa 
ggtttattcatttatggagaagatccaaaacttcatgaaatcacttctgaggcaccaata 
tattattatggttttgaagattcaaatgattttatagcaaaagacatcactcgaactgtt 
aatggttctgactttaaggttttctataaccaagaagaaattggtcagtttcatgtacca 
gcatacggtaaacataatatcttaaatgcaactgctgttattgctaacctttacataatg 
ggaattgatatggcattagtagctgagcatttgaagacgttttcaggggtaaagcgtcgt 
tttactgagaagattattgacgatactgtcattattgatgactttgctcaccatcctact 
gagattattgcgacattagatgctgctcgacaaaaatacccgtcaaaagaaattgtagct 
attttccaaccgcatacgttcactcgtacgatagctcttttagacgaatttgcccatgcc 
ttgagtcaagcggatagcgtttatctcgctcaaatatatggttctgctagagaagtagat 
aatggtgaggtgaaggtagaagatttagctgctaagattgtcaaacactcagatttagtg 
acagtcgaaaatgtctcgcctttactcaatcatgataatgctgtctatgtctttatgggt 
gctggagacattcaattgtatgagcgctcttttgaagaattattagctaacctaactaaa 
aatacacaa 

SEQ ID NO. 4612 

STRAIN COH1 reverse complement 

C AGG C T CT AGT GAC G T t GAC AAAT At TAT T T T AC C C AAC GT GG t T TAG AG C AAG C AG G T GT AA 
CT AT AT T AC C T T T C T C AC C G AAT AAT AT C AGT GAG GAT T TAG AG AT TAT T GC AG GAAAT G 
CTTTTCGTCCAGATAACAATGAAGAGTTGGCTTATGTTATTGAAAAGGGCTATCATTTTA 
AACGATATCATGAATTTCTCGGAGATTTTATGCGTCAGTTCACTAGTCTAGGTGTAGCTG 
G GG C AC AT GG AAAAAC C T C AAC GAC AGG T T TAT T AG CT CAT GT T T T AAAAAAT AT T AC AG 
ACACTTCTTTCCTAATTGGAGATGGTACAGGACGTGGTTCTGCTAATGCTAATTACTTTG 
T G T T T GAAG CT GAT G AAT AC G AACGT CAT T T T AT G CC GT AC C AT C C AG AAT ACT C AAT T A 
T T AC C AAT AT T GAT T T T GAC CAT C C T GAT TAT T T T AC AGG C CT AGAGG ACGT AT T C AAT G 
C C T T T AAT GAC TAT G C T AAG C AAG T T C AAAAAGGT T T AT T CAT T T AT GG AG AAG AT C C AA 
AACTTCATGAAATCACTTCTGAGGCACCAATATATTATTATGGTTTTGAAGATTCAAATG 
ATTTTATAGCAAAAGACATCACTCGAACTGTTAATGGTTCTGACTTTAAGGTTTTCTATA 
AC C AAGAAGAAAT T G G T C AG T T T CAT GT AC C AG CAT AC G GT AAAC AT AAT AT CT T AAAT G 
CAACTGCTGTTATTGCTAACCTTTACATAATGGGAATTGATATGGCATTAGTAGCTGAGC 
AT T T G AAGAC AT T T T C AG G G GT AAAG C GT CGT T T T AC T GAG AAG AT TAT T GAC GAT AC T G 
TCATTATTGATGACTTTGCTCACCATCCTACTGAGATTATTGCGACATTAGATGCTGCTC 
GACAAAAATACCCGTCAAAAGAAATTGTAGCTATTTTCCAACCGCATACGTTCACTCGTA 
CGATAGCTCTTTTAGACGAATTTGCCCATGCCTTGAGTCAAGCGGATAGCGTTTATCTCG 
C T C AAAT AT AT G GT T C T G C TAG AG AAG TAG AT AAT G G T G AGG T G AAGGT AG AAG AT T T AG 
C T G CT AAG AT T GT C AAAC AC T C AG AT T T AGT GAC AGT C G AAAAT GT CTCGCCTTTACT C A 
ATCATGATAATGCTGTCTATGTCTTTATGGGTGCTGGAGACATTCAATTGTATGAGCGCT 
CTTTTGAAGAATTATTAGCTAACCTAACTAAAAATACACAA 

SEQ ID NO. 4613 
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STRAIN A909 frame: 2 

DK Y Y FT QRGLE QAG VT I L P F S PNN I S E D LE 1 1 AGN AFR P DNNE E L A Y V I E KG YH FKR YHE 
FLG D FMRQ FT S LG VAG AHGKT S T T GL L AH VLKN I T DT S FL I G D GTGRG S AN AN Y FV FE AD 
E YERHFMPYHPE YS 1 1 TNI DFDHP DYFTGLED VFNAFNDYAKQVQKGL FI YGE D PKLHE I 
TSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNILNATAVI 
ANLYIMGIDMALVAEHLKTFSGVKRRFTEKIIDDTVIIDDFAHHPTEIIATLDAARQKYP 
SKEIVAI FQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVEDLAAKIV 
KHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERSFEELLANLTKNTQ 

SEQ ID NO. 4614 
STRAIN 1169NT frame: 2 

KAGSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKIIDDTVI I DDFAHHPTE I IATLD 
AARQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERSFEELLANLTKNTQ 

SEQ ID NO. 4615 
STRAIN 090 FRAME : 1 

KAGSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DSKLHEITSKAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKIIDDTVI I DDFAHHPTE I IATLD 
AARQKYPSKEIVAIFQPHTFTRT IALLDDFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKI VKHS DLVTVENVS PLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4616 
STRAIN H36B frame: 2 

KAGSSDVDKYYFTQRGLEQAGITILPFS PNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
H FKRYHE FLGD FMRQ FT S LGVAG AHGKT S TTGLL AHVLKN I T DT S FL I GDGT GRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKIIDDTVI I DDFAHHPTEI IATLD 
AARQKYPSKEIVAIFQPHTFTRT IALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKI VKHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERSFEELLANLTKNTQ 

SEQ ID NO. 4617 
STRAIN 18RS21 frame: 1 

KAGSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKIIDDTVI I DDFAHHPTE I IATLD 
AARQKYPSKEIVAI FQPHT FTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKIVKHSDLVTVENVS PLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4618 
STRAIN M732 frame: 2 

KAGSSDVDKYYFTQRGLEQAGVTILPFSPNNISEDLEIIAGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKIIDDTVI I DDFAHHPTEI IATLD 
AARQKYPSKEIVAI FQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKI VKHS DLVTVENVS PLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4619 

STRAIN JM9130013 frame: 2 

FKKAG S S D V DKYYFTQRGLE QAG ITILPFSPNNISEDLEII AGN A FR P DNNE E L A Y V I E K 
GYHFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSAN 
ANYFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIY 
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GEDPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKH 
NILNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKI I DDT VI IDDFAHHPTE 1 1 AT 
LDAARQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVK 
VEDLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4620 
STRAIN M781 frame: 1 

RAG S S D V DK Y Y FT QRG LE Q AG VT I L P FS PNN I S E D LE 1 1 AGN AFR P DNNE E LAY VI E KG Y 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
DPKLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNI 
LNATAVIANLYIMGI DMALVAEHLKTFSGVKRRFTEKI I DDTVII DDFAHH PTE I IATLD 
AARQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4621 
STRAIN CJB110 frame: 3 

KAGS S DVDKYYFTQRGLEQAGVT ILPFS PNNI SEDLEII AGNAFRPDNNEELAYVIEKGY 
HFKRYHEFLGDFMRQFTSLGVAGAHGKTSTTGLLAHVLKNITDTSFLIGDGTGRGSANAN 
YFVFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGE 
D S K LHE ITS KAP I Y Y YG FE D S N D F I AKD I TRT VNG S D FKVFYNQEE I GQ FH V P AYGKHN I 
LN AT AV IAN L Y I MG I DMAL VAE H LKT F S G VKRR FT EKIIDDTVIIDD FAHH PT E 1 1 AT L D 
AARQKYPSKEIVAIFQPHTFTRTIALLDDFAHALSQADSVYLAQIYGSAREVDNGEVKVE 
DLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMGAGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4622 
STRAIN 2603 frame: 1 

MS KT YHFI G I KG SGMSALALMLHQMGHNVQGS DVDKYYFTQRGLEQAGVT ILPFS PNN IS 
E DL E 1 1 AGN A FR P DNN E E L A Y V I E KG YQ FKR YHE FL G D FMRQ FT S L G VAGAHGKT S T TG L 

LAHVLKNITDTSFLIGDGTGRGSANANYFVFEADEYERHFMPYHPEYSIITNIDFDHPDY 
FTGLEDVFNAFNDYAKQVQKGLFIYGEDPKLHEITSEAPIYYYGFEDSNDFIAKDITRTV 
N G S D FKV F YNQE E I G Q FH V P AYGKHN I LN AT AV IAN L Y IMG I DMAL VAE H LKT F S G VKRR 

FTEKI I DDTVI IDDFAHHPTE 1 1 ATLDAARQKYPSKEIVAIFQPHTFTRTIALLDEFAHA 

LSQADSVYLAQIYGSAREVDNGEVKVEDLAAKIVKHSDLVTVENVSPLLNHDNAVYVFMG 
AGDIQLYERS FEELLANLTKNTQ 

SEQ ID NO. 4623 
STRAIN COH1 frame: 3 

GSS DVDKYYFTQRGLEQAGVT ILPFS PNNI SEDLE I IAGNAFRPDNNEELAYVIEKGYHF 
KR YH E FLG D FMRQ FT S LG VAG AH GKT S T T G L L AH V LKN I T D T S FL IGDGTGRG SAN AN Y F 

VFEADEYERHFMPYHPEYSIITNIDFDHPDYFTGLEDVFNAFNDYAKQVQKGLFIYGEDP 
KLHEITSEAPIYYYGFEDSNDFIAKDITRTVNGSDFKVFYNQEEIGQFHVPAYGKHNILN 
ATAVIANLYIMGIDMALVAEHLKTFSGVKRRFTEKII DDTVI I DDFAHH PTE II ATLDAA 
RQKYPSKEIVAIFQPHTFTRTIALLDEFAHALSQADSVYLAQIYGSAREVDNGEVKVEDL 
AAK I VKH S D L VT VE N V S P L LNH DN AV Y V FMGAG D I Q L YE R S FE E L LAN LT KN T Q 

SEQ ID NO. 4701 
STRAIN A909 

TAT T T T T T AAC AAC AAAAAAAG G AAAAG AG C T AAGG AAAAAT G C AGAAAA 
AT T C TAT GG AG AAT AT AAAGAAAAT C C AGAAGAAT AT CAT C AAAT AG C T A 
AAGAT AAAG C AAG T G AAT AT T C AAAT T TAG C T GT T GAT AC T T T T AAAG AT 
T ATAAAGGTAAATTTGAATCAGGT GAATT GAC AACAGAGGAT AT CGT CT C 
AGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGCTAATGATTTTG 
T C AAT C AAG C T AAAT C AAAAT T C T C AG AC GAG GAT AC T G C T AAAAAAG AA 
GAT AAGG C T C C T G AAAC AAAAG T AG AAG AT AT T G T CAT T GAT TAT AAAG A 
AAAC ACAGAAGAT AAAGAAAAA 

SEQ ID NO. 4702 
STRAIN H36B 

TAT T T T T T AAC AAC AAAAAAAG G AAAAG AG C T AAGG AAAAAT G C AG AAAA 
AT T C T AT GG AG AAT AT AAAGAAAAT C C AG AAG AAT AT CAT C AAAT AGC T A 
AAG AT AAAG C AAG T G AAT AT T C AAAT T TAG CT GT T GAT AC T T T T AAAG AT 
TAT AAAGGT AAAT T T GAAT C AG G T G AAT T GAC AAC AG AG GAT AT C G T C T C 
AG C CGT T AAGG AAAAAAGC G G AG AAGT AGT T GAC T T T G C T AAT GAT T T T G 
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T C AAT CAAG CT AAAT C AAAAT T CT C AG ACGAG GAT ACT G C T AAAAAAGAA 
GAT AAGGC T C CT G AAAC AAAAG T AG AAGAT AT T GT C AT T G AT T AT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4703 
STRAIN 18RS21 

TAT T T T T T AAC AAC AAAAAAAGGAAAAG AGCT AAG G AAAAAT G C AG AAAA 
AT T CT AT GGAGAAT AT AAAG AAAAT C C AGAAG AAT AT CAT C AAAT AG C T A 
AAG AT AAAG CAAG T G AAT AT T C AAAT T T AG CT GT T GAT AC T T T T AAAG AT 
TATAAAGGTAAATTTGAATCAGGTGAATTGACAACAGAGGATATCGTCTC 
AGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGCTAATGATTTTG 
T C AAT C AAGC T AAAT C AAAATT C T C AG AC GAGGAT AC T G C T AAAAAAGAA 
G AT AAGG CT C C T G AAAC AAAAGT AGAAGAT AT T GT CAT T GAT TAT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4704 
STRAIN M732 

TAT T T T T T AAC AAC AAAAAAAGG AAAAGAG C T AAG G AAAAAT G C AG AAAA 
AT T C TAT G GAGAAT AT AAAG AAAAT C C AGAAGAAT AT CAT C AAAT AG CT A 
AAG AT AAAG CAAG T G AAT AT T C AAAT T TAG C T G T T GAT AC T T T T AAAG AT 
TATAAAGGT AAAT TTG AAT CAGGT GAAT TGACAACAGAGGATAT CGT CT C 
AGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGCTAATGATTTTG 
T C AAT CAAG CT AAAT C AAAAT T C T C AG AC GAGGAT ACT G CT AAAAAAGAA 
GAT AAGG CT C CT G AAAC AAAAGT AG AAG AT AT T GT C AT T GAT TAT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4705 
STRAIN COH1 

TAT T T T T T AAC AAC AAAAAAAG G AAAAG AG CT AAGG AAAAAT G C AGAAAA 
AT TCTATGGAGAATAT AAAGAAAAT C C AGAAGAAT AT CAT CAAAT AGCT A 
AAG AT AAAG CAAGT GAAT AT T C AAAT T TAG C T GT T GAT ACT T T T AAAG AT 
TAT AAAG G T AAAT T T GAAT C AG GT G AAT T G AC AAC AG AGG AT AT C GT C T C 
AG C C GT T AAG GAAAAAAG CG G AG AAGT AG T T G ACT T T GCT AAT GAT T T T G 
T C AAT C AAGC T AAAT C AAAAT T C T C AGAC GAG GAT AC T G C T AAAAAAGAA 
GAT AAGG C T C C T GAAAC AAAAGT AGAAG AT AT T G T CAT T GAT TAT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4706 
STRAIN M781 

TAT T T T T T AAC AAC AAAAAAAGG AAAAGAG C 

T AAG GAAAAAT G C AG AAAAAT T C TAT G GAG AAT AT AAAG AAAAT C C AG AA 
GAAT AT CAT CAAATAGCT AAAGAT AAAGCAAGT GAATATT CAAAT T TAG C 
T GT T GAT AC T T T T AAAG AT TAT AAAGGT AAAT T T GAAT C AG GT G AAT T G A 
CAACAGAGGATATCGTCTCAGCCGTTAAGGAAAAAAGCGGAGAAGTAGTT 
GACTTT GCT AAT GATTTTGTC AAT C AAGCT AAAT C AAAATT CTCAGACGA 
GG AT ACT G C TAAAAAAG AAGAT AAGG CT C C T GAAAC AAAAGT AG AAG AT A 
T T G T CAT T GAT T AT AAAG AAAAC AC AG AAG AT AAAG AAAAA 

SEQ ID NO. 4707 
STRAIN 2603 

tattttttaacaacaaaaaaaggaaaagagctaaggaaaaatgcagaaaa 
attctatggagaatataaagaaaatccagaagaatatcatcaaatagcta 
aagataaagcaagtgaatattcaaatttagctgttgatacttttaaagat 
tataaaggtaaatttgaatcaggtgaattgacaacagaggatatcgtctc 
agccgttaaggaaaaaagcggagaagtagttgactttgctaatgattttg 
tcaatcaagctaaatcaaaattctcagacgaggatactgctaaaaaagaa 
gataaggctcctgaaacaaaagtagaagatattgtcattgattataaaga 
aaacacagaagataaagaaaaa 

SEQ ID NO. 4708 
STRAIN 090 

TATTTTTTaACaACAAAAAAAGGAAAAGAGCTAAGGAAAAATGCAGAAAA 
AT T CT AT GGAGAAT AT AAAGAAAAT CC AG AAG AAT AT CAT CAAAT AGCT A 
AAG AT AAAG CAAGT GAAT AT T CAAAT T T AG C T G T T GAT AC T T T T AAAG AT 



183 



WO 2004/018646 



PCT/US2003/026827 



SEQUENCE LISTING 



TAT AAAGGT AAAT T T GAAT C AG GT GAAT T GAC AAC AG AG GAT AT C GT C T C 
AGCCGT T AAGGAAAAAAG C GG AGAAGT AGT T G AC T T T GC T AAT GAT T T T G 
TC AAT CAAGCTAAAT CAAAATT CT CAGACGAGGATACTGCT AAAAAAGAa 
GAT AAGG C T C C T G AAAC AAAa GT AGAAGAT AT T GT C AT T GAT T AT AAAG A 
AAACACAGAAGATAAAGAAAAA 

SEQ ID NO. 4709 
STRAIN CJB110 

TAT T TT T T AAC AAC AAAAAAAG G AAAAG AG C T AAGG AAAA 
AT GC AGAAAAAT T C T AT G G AG AAT AT AAAGAAAAT C C AGAAG AAT AT CAT 
C AAAT AG C T AAAG AT AAAG C AAGT GAAT AT T C AAAT T T AG C T GT T GAT AC 
TT TTAAAGATT AT AAAGGT AAAT T T GAATCAGGT gAAT TGACAACAGAGG 
AT AT C GT CT C AG C C G t T AAGG AAAAAAG C G GAG AAGT AGT T GAC T T T G C T 
AATGATTTT GT CAAT CAAGCTAAAT CAAAATT CT CAGACGAGGATACTGC 
T AAAAAAGAAGAT AAG GC T C C T G AAAC AAAAG TAG AAG AT AT T G T C AT TG 
AT TAT AAAG AAAAC AC AGAAGAT AAAG AAAAA 

SEQ ID NO. 4710 
STRAIN 1169NT 

TATTTTTTAACAACAAAAAAAGGAAAAGAGCTAAGGAAA 
AAT GCAGAAAAATT CTATGGAGAAT AT AAAGAAAAT CCAGAAGAAT AT CA 
T C AAAT AG CTAAAGAT AAAG C AAGT GAAT AT T C AAAT T T AGC T G T T GAT A 
CT T T T AAAG AT TAT AAAG GT AAAT T T GAAT C AGGT GAAT T GAC AAC AG AG 
GATATCGTCTCAGCCGTTAAGGAAAAAAGCGGAGAAGTAGTTGACTTTGC 
T AATGATTTT GT CAAT CAAGCTAAAT CAAAATT CT CAGATGAGGATACTG 
CTAAAAAAGAAAAT AAGGCT CCT G AAAC AAAAG T AGAAG AT AT T GT CAT T 
GAT T AT AAAGAAAAC AC AG AAG AT AAAG AAAAA 

SEQ ID NO. 4711 
STRAIN JM9130013 

T ATT TTT T Aa CAAC AAAAAAAGGAAAAGAGCT AAGGAAAA 
AT G C AG AAAAAT T CT AT GG AG AAT AT AAAG AAAAT C C AG AAGAAT AT CAT 
C AAAT AGC T AAAG AT AAAG C AAG T GAAT AT T CAAATT TAG CT GT T GAT AC 
TT TTAAAGATT AT AAAGGT AAATTTGAAT C AGGT GAATTGACAACAGAGG 
AT AT CGTCTC AGCCGT T AAG GAAAAAAGCGGAG AAGT AGTTG ACT TTGCT 
AAT GAT T T T G T CAAT CAAGCTAAAT C AAAAT T C T C AG AC GAG GAT AC T G C 
T AAAAAAGAAGAT AAGG CT C CT GAAAC AAAAGT AG AAGAT ATT GT C ATT G 
AT TAT AAAG AAAAC AC AGAAG AT AAAGAAAAA 
SEQ ID NO. 4712 
STRAIN 2603 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 
TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4713 
STRAIN A909 frame: 1 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDT FKDYKGKFESGEL 
TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4714 
STRAIN H36B frame: 1 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 
TTEDIVSAVKEKSGEVVDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4715 
STRAIN 18RS21 frame: 1 

YFLTTKKGKELRKNAEKFYGEYKEN PEE YHQIAKDKASEYSNLAVDT FKDYKGKFESGEL 
TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKE DKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4716 
STRAIN M732 frame: 1 
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YFLTTKKGKE LRKNAEKFYGE YKEN PEE YHQ I AKDKAS E Y SNLAVDT FKDYKGKFES GEL 

TTEDIVSAVKEKSGEVVDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4717 
STRAIN _COHl frame: 1 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4718 
STRAIN _M781 frame: 1 

YFLTTKKGKE LRKNAEKFYGE YKEN PEE YHQ I AKDKAS EYSNLAVDT FKDYKGKFES GEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4719 
STRAIN _090 frame: 1 

YFLTTKKGKE LRKNAEKFYGE YKEN PEE YHQ I AKDKAS EYSNLAVDT FKDYKGKFES GEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4720 

STRAIN _CJB110 frame: 1 

YFLTTKKGKE LRKNAEKFYGE YKEN PEE YHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4721 
STRAIN 1169NT frame: 1 

YFLTTKKGKELRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKENKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO. 4722 

STRAIN _JM9130013 frame: 1 

YFLTTKKGKE LRKNAEKFYGEYKENPEEYHQIAKDKASEYSNLAVDTFKDYKGKFESGEL 

TTEDIVSAVKEKSGEWDFANDFVNQAKSKFSDEDTAKKEDKAPETKVEDIVIDYKENTE 
DKEK 

SEQ ID NO: 4801 
STRAIN 2603 

aatagtactgagacaagtgcttcagtagttcctactacaaatactatcgt 
tcaaactaatgacagtaatcctaccgcaaaatttgtatcagaatcaggac 
aatctgtaataggtcaagtaaaaccagataattctgcggcgcttacaaca 
gttgacacgcctcatcatatttcagctccagatgctttaaaaacaactca 
atcaagtcctgtcgttgagagtacttctactaagttaactgaagagactt 
acaaacaaaaagatggtcaagatttagccaacatggtgagaagtggtcaa 
gttactagtgaggaactcgttaatatggcatacgatattattgctaaaga 
aaacccatctttaaatgcagtcattactactagacgccaagaagctattg 
aagaggctagaaaacttaaagataccaatcagccgtttttaggtgttccc 
ttgttagtcaaggggttagggcacagtattaaaggtggtgaaaccaataa 
tggcttgatctatgcagatggaaaaattagcacatttgacagtagctatg 
tcaaaaaatataaagatttaggatttattattttaggacaaacgaacttt 
ccagagtatgggtggcgtaatataacagattctaaattatacggtctaac 
gcataatccttgggatcttgctcataatgctggtggctcttctggtggaa 
gtgcagcagccattgctagcggaatgacgccaattgctagcggtagtgat 
gctggtggttctatccgtattccatcttcttggacgggcttggtaggttt 
aaaaccaacaagaggattggtgagtaatgaaaagccagattcgtatagta 
cagcagttcattttccattaactaagtcatctagagacgcagaaacatta 
ttaacttatctaaagaaaagcgatcaaacgctagtatcagttaatgattt 
aaaatctttaccaattgcttatactttgaaatcaccaatgggaacagaag 
ttagtcaagatgctaaaaacgctattatggacaacgtcacattcttaaga 
aaacaaggattcaaagtaacagagatagacttaccaattgatggtagagc 
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attaatgcgtgattattcaaccttggctattggcatgggaggagcttttt 
caacaattgaaaaagacttaaaaaaacatggttttactaaagaagacgtt 
gatcctattacttgggcagttcatgttatttatcaaaattcagataaggc 
tgaacttaagaaatctattatggaagcccaaaaacatatggatgattatc 
gtaaggcaatggagaagcttcacaagcaatttcctattttcttatcgcca 
acgaccgcaagtttagcccctctaaatacagatccatatgtaacagagga 
agataaaagagcgatttataatatggaaaacttgagccaagaagaaagaa 
ttgctctctttaatcgccagtgggagcctatgttgcgtagaacacctttt 
acacaaattgctaatatgacaggactcccagctatcagtatcccgactta 
cttatctgagtctggtttacccatagggacgatgttaatggcaggtgcaa 
actatgatatggtattaattaaatttgcaactttctttgaaaaacatcat 
ggttttaatgttaaatggcaaagaataatagataaagaagtgaaaccatc 
tactggcctaatacagcctactaactccctctttaaagctcattcatcat 
tagtaaatttagaagaaaattcacaagttactcaagtatctatctctaaa 
aaatggatgaaatcgtctgttaaaaataaaccatccgtaatggcatatca 
aaaagca 

SEQ ID NO: 4802 
STRAIN 090 

AAT AG T ACT G AG AC AAGT G CT T C AGT AGT T C C T AC T AC AA 
AT ACT AT C G T T C AAAC T AAT G AC AGT AAT C C T AC C G C AAAAT T T GT AT C A 
GAAT C AG G AC AAT C T G T AAT AG G T C AAGT AAAAC C AG AT AAT TCTGCGGC 
GCTTACAACAGTTGACACGCCTCATCATATTTCAGCTCCAGATGCTTTAA 
AAAC AAC T C AAT C AAGT CCTGTCGTT GAG AGT AC T T C T AC T AAGT T AACT 
G AAG AGAC T T AC AAAC AAAAAGAT GGT AAAG AT T T AG C C AAC AT G G T GAG 
AAGT G GT C AAGT T AC T AGT GAG GAAC T C G T T AAT AT G G C AT ACG AT AT T A 
T T G C T AAAG AAAAC C C AT CT T T AAAT G C AGT CAT T AC T AC T AG ACG C C AA 
GAAG C TAT T GAAG AG G C T AGAAAAC T T AAAG AT AC C AAT C AG CCGTTTTT 
AGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCACAGTATTAAAGGTGGTG 
AAAC C AAT AAT G GC T T GAT C TAT G C AG AT GG AAAAAT T AG C AC AT T T G AC 
AGT AG C T AT GT C AAAAAAT AT AAAG AT T T AGGAT T T AT TAT T T T AGGAC A 
AACGAACTTTCCAGAGTATGGGTGGCGTAATATAACAGATTCTAAATTAT 
ACGGTCTAACGCATAATCCTTGGGATCTTGCTCATAATGCTGGTGGCTCT 
TCTGGTGGAAGTGCAGCAGCCATTGCTAGCGGAATGACGCCAATTGCTAG 
CGGTAGTGATGCTGGTGGTTCTATCCGTATTCCATCTTCTTGGACGGGCT 
T G GT AG G T T T AAAAC C AAC AAG AG GAT T G GTG AGT AAT G AAAAG C C AG AT 
T CG TAT AGT AC AG C AGT T CAT T T T C C ATT AAC T AAGT CAT C TAG AGAC G C 
AG AAAC AT TAT T AAC T TAT C T AAAGAAAAG CG AT C AAAC G CT AG TAT C AG 
T T AAT GAT T T AAAAT CT T t AC C AAT T G C T TAT AC T T T GAAAT C AC C AAT G 
G GAAC AG AAGT T AGT C AAG AT GC T AAAAACG CT AT TAT G G AC AACGT C AC 
AT T C T T AAG AAAAC AAG GAT T C AAAGT AAC AG AG AT AG AC T T AC C AAT T G 
AT GGT AG AG CAT T AAT G C G T GAT TAT T C AAC C T T GG CT AT T GGC AT GGG A 
GG AG C T T T T T C AAC AAT T GAAAAAGACT T AAAAAAAC AT G G T T T T AC T AA 
AG AAG ACGT T GAT C C T AT T AC T T G GG C AG T T CAT GT T AT T TAT C AAAAT T 
C AG AT AAGG C T G AACT T AAG AAAT C T AT T AT GG AAGC C C AAAAAC AT AT G 
GAT GAT TAT CG T AAG G C AAT GG AG AAG C T T C AC AAG C AAT T T C CT AT T T T 
C T T AT C G C C AAC G AC C G C AAGT TT AGC C CC T C T AAAT AC AG AT C CAT AT G 
T AAC AG AG GAAG AT AAAAG AGC GAT T T AT AAT AT GG AAAACT T GAG C C AA 
GAAGAAAGAATTGCTCTCTTTAATCGCCAGTGGGAGCCTATGTTGCGTAG 
AAC AC C T T T TAG AC AAAT T GC T AAT AT G AC AG GAC T C C C AG C T AT C AG T A 
TCCCGACTTACTTATCTGAGTCTGGTTTACCCATAGGGACGATGTTAATG 
G C AG GT G C AAAC TAT GAT AT G G T AT T AAT T AAAT T T G C AAC T T T C T T T G A 
AAAAC AT CAT GGT T T T AAT G T T AAAT GG C AAAG AAT AAT AG AT AAAGAAG 
TGAAACCATCTACTGGCCTAATACAGCCTACTAACTCCCTCTTTAAAGCT 
CAT T CAT CAT TAG T AAAT T T AG AAGAAAAT T C AC AAGT TACT C AAGT AT C 

TATCTCTAAAAAATGGATGAAATCGTCTGTTAAAAATAAACCATCCGTAA 
T GG CAT AT C AAAAAG C A 

SEQ ID NO: 4803 
STRAIN A909 

TACT AC AAAT ACTATCGTTCAAACT AAT GAC AGT AAT CCTACCGCAAAAT 
T T GT AT C AG AAT C AG GAC AAT C T GT AAT AG G T C AAG T AAAAC C AG AT AAT 
TCTGCGGCGCT T AC AAC AGT T G AC AC GC CT CAT CAT AT T T C AG C T C C AG A 
T G C T T T AAAAAC AAC T C AAT C AAGT C CT GT CGT T GAG AGT AC TTCTACTA 
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AGT T AAC T GAAG AG ACT T AC AAAC AAAAAG AT GGT C AAGAT T TAG C C AAC 
ATGGTGAGAAGTGGTCAAGTTACTAGTGAGGAACTCGTTAATATGGCATA 
C GAT AT TAT T G C T AAAG AAAAC C CAT CT T T AAAT GC AGT CAT T AC TACT A 
G AC GC C AAG AAG C TAT T GAAGAGGC T AG AAAAC T T AAAG AT AC CAAT C AG 
CCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCACAGTATTAA 
AG GT G GT G AAAC C AAT AAT GGC T T GAT CT AT GC AGAT GGAAAAAT TAG C A 
CAT T T G AC AG TAG C TAT GT C AAAAAAT AT AAAG AT T TAGGAT T TAT TAT T 
T T AGG AC AAACG AACT T T C C AG AGT AT GGG T GG CG T AAT AT AAC AG AT T C 
TAAATTATACGGTCTAACGCATAATCCTTGGGATCTTGCTCATAATGCTG 
GTGGCTCTTCTGGTGGAAGTGCAGCAGCCATTGCTAGCGGAATGACGCCA 
ATTGCTAGCGGTAGTGATGCTGGTGGTTCTATCCGTATTCCATCTTCTTG 
G AC G GG C T T G GT AG G T T T AAAAC C AAC AAGAGG AT T GGT GAGT AAT G AAA 
AGCCAGATTCGTATAGTACAGCAGTTCATTTTCCATTAAcTAAGTCATCT 
AGAGAC G C AGAAAC AT TAT T AACT TAT C T AAAG AAAAG CG AT C AAAC GC T 
AGT AT C AGT T AAT GAT T T AAAAT CT T T AC CAAT T G C T TAT AC T T T G AAAT 
C AC CAAT G G GAAC AG AAG T T AGT C AAG AT G CT AAAAACG CT AT TAT G G AC 
AAC G T C AC a T T C T T AAG AAAAC AAGG AT T C AAAG T AAC AG AGAT AGAC T T 
AC CAAT T GAT G G T AGAG CAT T AAT GCGT GAT TAT T C AAC CT T GGC TAT T G 
G CAT GGGAG G AG CT T T T T C AAC AAT T G AAAAAGAC T T AAAAAAAC AT GGT 
T T T AC T AAAG AAGACGT T GAT C CT AT T AC T T G G G C AGT T C AT GT T AT T T A 
T C AAAAT T C AG AT AAG G C T G AAC TT AAG AAAT C TAT TAT G GAAGC C C AAA 
AAC AT AT GG AT GAT TAT C GT AAG G C AAT GG AGAAG C T T C AC AAG CAAT T T 
CCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTCTAAATACAGA 
T C CAT AT GT a AC AGAG G AAG AT AAAAGAG C GAT T TAT AAT AT G GAAAAC T 
TGAGCCAAGAAGAAAGAATTGCTCTCTTTAATCGCCAGTGGGAGCCTATG 
T T G C G TAG AAC AC C T T T T AC AC AAAT T G CT AAT AT GAC AGG AC T C C C AGC 
TAT C AGT AT C C C G AC T T AC T TAT C T G AGT C T G G T T T AC C CAT AGG G AC G A 
T GT T AAT GG CAGGT G C AAAC TAT GAT AT G GT AT T AAT T AAAT T T G C AAC T 
T T C T T T G AAAAAC AT CAT G GT T T T AAT GT T AAAT G G C AAAG AAT AAT AG A 
T AAAGAAGT G AAAC CAT C T AC T GGC C T AAT AC AG C C T AC T AAC T C C C T CT 
T T AAAG C T CAT T CAT CAT TAG T AAAT T T AGAAG AAAAT T C AC AAGT T AC T 

CAAGTATCTATCTCTAAAAAATGGATGAAATCGTCTGTTAAAAATAAACC 
AT C CGT AAT G G CAT AT C AAAAAG C A 

SEQ ID NO: 4804 
STRAIN COH1 

AAT AGT AC T G AGAC AAGT G C T T C AGT AG C T C C T AC T AC AAAT 
ACT AT CGT T C AAACT AAT GAC AGT AAT C C T AC C G C AAAAT T T G CAT C AG A 
AT C AG GAC AAT C TGT AAT AGGT C AAGT AAAAC C AG C T AAT TCTGCGGCGC 
T T AC AAC AGT T GAC AC G C C T CAT AT T T C AG C T C C AG AT G C T T T AAAAAC A 
ACT CAAT C AAGT C C T GT CGT T GAG AGT C CT T C TACT AAG T T AAC T GAAG A 
GACATACAAACAAAAAGATGGTCAAGATTTAGCCAACATGGTGAGAAGTG 
GTCAAGTTACTAGTGAGGAACTCGTCAATATGGCATACGATATTATCGCT 
AAAG AAAAC C CAT CT T T AAAT G C AGT C AT T AC T AC TAG AC GC C AAG AAG C 
CAT T GAAG AG G C TAG AAAAC T T AAAG AT AC T AAT C AG C C G T T T T T AGGT G 
T T C C c T T GT T AGT C AAG G G G T TAG G G C AC AGT AT T AAAG GT GGT G AAAC C 
AAT AAT G G C T T GAT C T AT G C AG AT G G AAAAAT T AG C AC AT T T GAC AGT AG 
C T AT GT C AAAAAAT AT AAAG AT T TAG GAT T TAT TAT T T TAG GAC AAAC G A 
ATTTTCCAGAGTATGGGTGGCGTAATATAACAGACTCTAAATTATACGGT 
CCAACGCATAATCCTTGGAATCTTGCTCATAACGCTGGTGGCTCTTCTGG 
TGGAAGTGCAGCAGCTATTGCTAGCGGAATGACGCCAATTGCTAGCGGCA 
GTGATGCTGGTGGTTCTATCCGTATTCCATCTTCTTGGACGGGCTTAGTA 
G GT T T AAAAC C AAC AAG AGG AT T GG T GAGT AAT G AAAAG C C AG AT T C GT A 
T AGT AC AG C AGT T C AT TT T C CAT T AAC T AAG T CAT CT AG AG AC G C AG AAA 
CAT T GT T AAC T T AC C T AAAG AAAAG C GAT C AAAC G C T AGT AT C AG T T AAT 
GAT T T AAAAT C T T T AC CAAT T G CT T AT AC T T T G AAAT C AC CAAT G G GAAC 
AGAAG T T AGT C AAGAT G C T AAAAAT G C T AT T AT G GAC AAC GT C AC AT T C T 
T AAG AAAAC AAG GAT T C AAAG T GAC AG AG AT AGAT T t AC CAAT T G AT GG T 
AGAGCATTAATGCGTGATTATTCAACCTTGGCTATTGGCATGGGAGGAGC 
T T T T T C AAC AAT T G AAAAAG AC T T AAAAAAAC AT G GT T T T AC T AAAG AAG 
AC GT T GAT C C C AT T AC T T G GG C AGT T CAT G T T AT T TAT C AAAAT T C AGAT 
AAGG C T GAAC T T AAGAAAT C T AT T G T GG AAG C C C AAAAAC AT AT G GAT G A 
T T AT C GT AAG G CAAT GG AGAAG CT T C AC AAG CAAT T T C C T AT T T T CT T AT 
C G C C AACG AC C G C AAg T T T AG C C C C T C T AAAT AC AG AT C CAT AT GT AAC A 
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GAG AAAG AT AAAAG AG C GAT T T AT AAT AT G GAAAACT T GAG C C AAGAAG A 
AAGAATTGCTCTCTTTAATCGCCAGTGGGAGCCTATGTTGCGTAGAACAC 
CT T T T AC AC C AAT T GC T AAT At GAC AG G AC T C C C AG CT AT C AGT AT C C CG 
ACTTACTTATCTGAGTCTGGTTTACCCATAGGGACGATGTTAATGGCAGG 
TGCAAACTATGATATGGTATTAATTAAATTTGCAACTTTCTTTGAAAAAC 
ATCATGGTTTTAATGTTAAATGGCAAAGAATAATAGATAAAGAAGTGAAA 
CCATCTGCTGACCTAATACAGCCTACTAACTCCCTCTTTAAAGCTCATTC 
AT CAT T AGT AAAT T T AGAAG AAAAT T C AC AAG T T AC T C AAGT AT C T AT C T 
CTAAAAAAT GG AT GAAAT C GT CT G T T AAAAAT AAAC CAT C CG T AAT GG C A 
TAT C AAAAAGC A 

SEQ ID NO: 4805 
STRAIN M732 

T C AGT AGCT C CT AC T AC AAAT ACT AT C GT T C AAAC T AAT G AC AGT AATCC 
TACCGCAAAATTTGCATCAGAATCAGGACAATCTGTAATAGGTCAAGTAA 
AAC C AGC T AAT TCTGCGGCGC T T AC AAC AGT T GAC AC G C CT CAT AT T T C A 
GCTCCAGATGCTTTAAAAACAACTCAATCAAGTCCTGTCGTTGAGAGTCC 
T T C T AC T AAGT T AAC T G AAGAGAC AT AC AAAC AAAAAG AT GGT C AAGAT T 
TAG C C AAC AT GGT G AG AAG T GGT C AAGT T AC T AGT GAG G AAC T C GT C AAT 
AT GG CAT AC GAT AT TAT C G C T AAAGAAAAC C CAT CT T T AAAT G C AG T CAT 
T AC T AC TAG AC G C C AAG AAG C CAT T G AAG AG G C T AG AAAAC T T AAAG AT A 
CTAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCAC 
AGTATTAAAGGTGGTGAAACCAATAATGGCTTGATCTATGCAGATGGAAA 
AAT TAG C AC AT T T GAC AGT AG C TAT G T C AAAAAAT AT AAAG AT T T AGGAT 
TTATTATTTTAGGACAAACGAATTTTCCAGAGTATGGGTGGCGTAATATA 
AC AG AC T C T AAAT T AT AC GG T CnAAC GC AT AAT C CT T G GGAT C T T G C T C A 
TAACGCTGGTGGCTCTTCTGGTGGAAGTGCAGCAGCTATTGCTAGCGGAA 
TGACGCCAATTGCTAGCGGCAGTGATGCTGGTGGTTCTATCCGTATTCCA 
T C T T C T T GGAC G GG C T T AGT AGGT T T AAAAC C AAC AAG AGGAT T G GT GAG 
T AAT G AAAAGC C AG AT T C GT AT AGT AC AG C AGT T CAT T T T C CAT T AACT A 
AGT CAT C TAG AG AC GC AG AAAC AT T GT T AAC T T AC C T AAAG AAAAG C GAT 
C AAAC G C TAG TAT C AGTT AAT GAT T T AAAAT C T T T AC C AAT T G C T TAT AC 
T T T GAAAT C AC C AAT GGG AAC AGAAG T T AGT C AAG AT G C T AAAAAT G CT A 
T TAT GGAC AAC GT C AC AT T C T T AAG AAAAC AAGG AT T C AAAGT G AC AGAG 
AT AG AT T T AC C AAT T GAT GGT AG AGC AT T AAT G CG T GAT TAT T C AAC C T T 
GG C TAT T GG C AT GGG AG GAG C T T T TT C AAC AAT T G AAAAAG ACT T AAAAA 
AACATGGTTTTACTAAAGAAGACGTT GAT CCCATTACTTGGGC AGTT CAT 
GTTATTTATCAAAATTCAGATAAGGCTGAACTTAAGAAATCTATTGTGGA 
AG C C C AAAAAC AT AT G GAT GAT TAT C GT AAGG C AAT G GAG AAG CT T C AC A 
AGCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTCTA 
AAT AC AG AT C CAT AT GTT AC AG AG AAAG AT AAAAG AG CG AT T TAT AAT AT 
GGAAAACTTGAGCCAAGAAGAAAGAATTGCTCTCTTTAATCGCCAGTGGG 
AG C CT AT GT T GC GT AG AAC AC C T T T T AC AC C AAT T G CT AAT AT G AC AGG A 
CTCCCAGCTATCAGTATCCCGACTTACTTATCTGAGTCTGGTTTACCCAT 
AG G GAC GAT G T T AAT GG C AG GT GC AAAC TAT GAT AT GGT AT T AAT T AAAT 
TTGCAACTTTCTTTGAAAAACATCATGGTTTTAATGTTAAATGGCAAAGA 
AT AAT AG AT AAAG AAG T G AAAC CAT C T GC T GAC C T AAT AC AG C CT ACT AA 
CTCCCTCTT T AAAG CT CAT T CAT CAT T AG T AAAT T T AG AAGAAAAT T C AC 
AAGTT ACT C AAGT AT CT AT CTCTAAAAAATGGAT GAAAT CGTCT GTT AAA 
AAT AAAC CAT C C GT AAT G G CAT AT C AAAAAG C A 

SEQ ID NO: 4806 
STRAIN 18RS21 

AATAGTACTGAGACAAGTGCTTCAGTAGTTCCTACTACAAATACTATCGT 
T C AAACT AAT GAC AGT AAT CCT AC CG C AAAAT T T GT AT C AG AAT C AGG AC 
AAT C T GT AAT AG G T C AAGT AAAAC C AG AT AAT TCTGCGGCGC T T AC AAC A 
GTT GAC AC G C C T CAT CAT AT T T C AG C T C C AG AT G C T T T AAAAAC AAC T C A 
AT C AAGT CCTGTCGTT GAG AG TACT T CT ACT AAGT T AAC T G AAG AG AC T T 
ACAAACAAAAAGATGGTCAAGATTTAGCCAACATGGTGAGAAGTGGTCAA 
GT T AC TAG T GAG GAAC T C GT T AAT AT GG C AT AC GAT AT TAT T G C T AAAG A 
AAAC C C AT C T T T AAAT G C AGT CAT T AC TAG T AG AC GC C AAG AAG C TAT T G 
AAG AG G C TAG AAAAC T T AAAG AT AC C AAT C AG C C GT T T T T AGGT GT T C C C 
T T GT T AGT C AAGGGG T T AGGG C AC AGT AT T AAAGG T GGT G AAAC C AAT AA 
T GG C T T GAT CT AT G C AGAT GGAAAAAT T AG C AC AT T T GAC AG TAG C TAT G 
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T CAAAAAATAT AAAGATT TAGGAT TT ATT AT TTT AGG ACAAACGAACT T T 
CCAGAGTATGGGTGGCGTAATATAACAGATTCTAAATTATACGGTCTAAC 
GCATAATCCTTGGGATCTTGCTCATAATGCTGGTGGCTCTTCTGGTGGAA 
GTGCAGCAGCCATTGCTAGCGGAATGACGCCAATTGCTAGCGGTAGTGAT 
GCTGGTGGTTCTATCCGTATTCCATCTTCTTGGACGGGCTTGGTAGGTTT 
AAAAC C AAC AAG AGG AT T G GT GAGT AAT GAAAAG C C AG AT T CGT AT AG T A 
C AG C AGT T CAT T T T C CAT T AAC T AAG T CAT C TAG AG AC G C AG AAA CAT T A 
T T AAC T TAT C T AAAGAAAAG C GAT C AAACG CT AGT AT C AGT T AAT GAT T T 
AAAAT C T T T AC C AAT T G C T TAT AC T T T G AAAT C AC C AAT GG G AAC AGAAG 
T TAG T C AAG AT G CT AAAAACG C TAT TAT GG AC AACGT C AC AT T CT T AAGA 
AAAC AAGGAT T C AAAG T AAC AG AGAT AG AC T TAG C AAT T GAT G GT AG AG C 
ATTAATGCGTGATTATTCAACCTTGGCTATTGGCATGGGAGGAGCTTTTT 
C AAC AAT T G AAAAAG AC T T AAAAAAA CAT GG T T T T AC T AAAGAAG AC GT T 
GAT C C TAT TACT T G GG C AGT T CAT G T TAT T TAT C AAAAT T C AG AT AAG G C 
TGAACTTAAGAAATCTATTATGGAAGCCCAAAAACATATGGATGATTATC 
GTAAGGCAATGGAGAAGCTTCACAAGCAATTTCCTATTTTCTTATCGCCA 
AC GAC CG C AAGT T TAG C C C CT CT AAAT AC AGAT C CAT AT GT AAC AG AG G A 
AG a t AAAAG AG CG AT T TAT AAT AT G GAAAAC T T GAG C C AAG AAGAAAG AA 
TTGCTCTCTTTAATCGCCAGTGGGAGCCTATGTTGCGTAGAACACCTTTT 
AC AC AAAT T G C T AAT AT GAC AGG AC T C C C AG C TAT C AGT AT C C CGAC T T A 
CTTATCTGAGTCTGGTTTACCCATAGGGACGATGTTAATGGCAGGTGCAA 
AC TAT GAT AT G GT AT T AAT T AAAT T T G C AAC T T T C T T T GAAAAAC AT CAT 
G G T T T T AAT GT T AAAT G G C AAAG AAT AAT AG AT AAAG AAG T GAAAC CAT C 
TACTGGCCTAATACAGCCTACTAACTCCCTCTTTAAAGCTCATTCATCAT 
T AGT AAAT T TAG AAG AAAAT T C AC AAGT T AC T C AAGT AT C T AT C T C T AAA 
AAAT G GAT G AAAT C G T C T GT TAAAAAT AAAC CAT C C GT AAT GG C AT AT C A 
AAAAG C A 

SEQ ID NO: 4807 
STRAIN M781 

T G C T T C AGT AGC T C C T AC T AC AAAT AC TAT C GT T C AAAC T AAT GAC AGT A 
AT C CT AC C G C AAAAT T T GC AT C AG AAT C AGG AC AAT CT G T AAT AGG T C AA 
GT AAAAC C AG C T AAT T C T G CGGCG CT T AC AAC AG T T GAC AC G C C T CAT AT 
TTCAGCTCCAGATGCTTTAAAAACAACTCAATCAAGTCCTGTCGTTGAGA 
GT C C TT C TACT AAG T T AAC T G AAG AGAC AT AC AAAC AAAAAG AT GGT C AA 
GAT T TAG C C AAC AT G G T GAG AAGT G G T C AAG T T AC TAG T GAG GAAC T CGT 
CAATATGGCATACGATATTATCGCTAAAGAAAACCCATCTTTAAATGCAG 
T CAT T AC T AC T AGAC G C C AAG AAG C CAT T G AAG AGG C TAG AAAAC T T AAA 
GATACTAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGG 
G C AC AGT AT t AAAG GT GGT GAAAC C AAT AAT GG C T T GAT CT AT G C AG AT G 
G AAAAAT TAG C AC AT T T GAC AGT AG C T AT G T CAAAAAATAT AAAG AT T T A 
GGATTTATTATTTTAGGACAAACGaATTTTCCAGAGTATGGGTGGCGTAA 
TATAACAGACTCTAAATTATACGGTCCAACGCATAATCCTTGGAaTCTTG 
CTCATAACGCTGGTGGCTCTTCTGGTGGAAGTGCAGCAGCTATTGCTAGC 
GGAATGACGCCAATTGCTAGCGGCAGTGATGCTGGTGGTTCTATCCGTAT 
TCCATCTTCTTGGACGGGCTTAGTAGGTTTAAAACCAACAAGAGGATTGG 
T GAGT AAT GAAAAG C C AG AT T C GT AT AG T AC AG C AGT T CAT T T T C C AT T A 
ACT AAGT CAT C TAG AG AC G C AG AAAC AT T G T T AAC T T AC C T AAAG AAAAG 
C GAT C AAAC G C T AGT AT C AGT T AAT GAT T T AAAa T C T T T AC C AAT T G C T T 
AT ACT T T G AAAT C AC C AAT G GG AAC AGAAg T T AGT C AAG AT G C TAAAAAT 
G CT AT T AT GG AC AAC GT C AC AT T C T T AAG AG AAC AAGGAT T C AAAGT G AC 
AGAG AT AG AT T T AC C AAT T GAT G GT AG AG CAT T AAT G C GT G AT TAT T C AA 
CCTTGGCTATTGGCATGGGAGGAGCTTTTTCAACAATTGAAAAAGACTTA 
AAAAAAC AT G GT T T T AC T AAAG AAG AC G T T GAT C C CAT TAG T T GG G C AGT 
T CAT G T T AT T T AT C AAAAT T C AG AT AAG G C T GAAC T T AAG AAAT C T AT T G 
T G GAAG C C C AAAAAC AT AT G GAT GAT TAT C GT AAGG C AAT GG AG AAG C T T 
CACAAGCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCC 
T C T AAAT AC AG AT C CAT AT GT AAC AG a G a AAG AT AAAAG AG C G AT T TATA 
AT AT G GAAAAC T T GAG C C AAG AAG AAAG AAT TGCTCTCTT T AAT C G C C AG 
TGGGAGCCTATGTTGCGTAGAACACCTTTTACACCAATTGCTAATAtGAC 
AG G ACT C C C AG C T AT C AGT AT C C C G AC T T AC T T AT C T G AGT C T GGT T T AC 
C C AT AGGG AC G AT GT T AAT GGC AGGT G C AAAC TAT GAT AT GGT AT T AAT T 
AAATTTGCAACTTTCTTTGAAAAACATCATGGTTTTAATGTTAAATGGCA 
AAG AAT AAT AG AT AAAG AAGT GAAAC CAT CT G C T GAC C T AAT AC AG C C T A 
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C T AACT C C C T C T T T AAAG C T CAT T CAT CAT T AGT AAAT T TAGAAG AAAAT 
T C AC AAGT T AC T C AAGT AT C T AT C T C T AAAAAAT G GAT G AAAT CGT C T GT 
T AAAAAT AAAC CAT C C GTAAT GG CAT AT C AAAAAG C A 

SEQ ID NO: 4810 
STRAIN CJB110 

T AGT T C CT ACT AC AAAT ACT AT C GT T C AAACT AAT G AC AG T AAT C CT AC C 
G C AAAAT T T GT AT C AG AAT C AGG AC AAT C T G T AAT AGGT C AAGT AAAAC C 
AGAT AAT T C T G C G G C GC T T AC AAC AGT TG AC AC G C CT CAT CAT AT T T C AG 
C T C C AGAT GC T T T AAAAAC AAC T C AAT C AAGT CCTGTCGTT GAG AGT AC T 
T C T AC T AAGT T AACT G AAG AG AC T T AC AAAC AAAAAG AT GGTAAAG AT T T 
AG C C AAC AT GGT G AGAAGT GGT C AAGT T AC T AGT GAG G AAC T CGT T AAT A 
T GG C AT AC GAT AT TAT T G C T AAAGAAAAC C CAT CT T T AAAT G C AGT CAT T 
ACTACTAGACGCCAAGAAGCTATTGAAGAGGCTAGAAAACTTAAAGATAC 
CAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCACA 
G T AT T AAAGGT GGT GAAAC C AAT AAT GGC T T GAT C TAT G C AG AT GG AAAA 
AT TAG C AC AT T T G AC AGT AG C T AT GT C AAAAAAT AT AAAG AT T TAG GAT T 
TATTATTTTAGGACAAACGAACTTTCCAGAGTATGGGTGGCGTAATATAA 
CAGATTCTAAATTATACGGTCTAACGCATAATCCTTGGGATCTTGCTCAT 
AATGCTGGTGGCTCTTCTGGTGGAAGTGCAGCAGCCATTGCTAGCGGAAT 
GACGCCAATTGCTAGCGGTAGTGATGCTGGTGGTTCTATCCGTATTCCAT 
CTTCTTGGACGGGCTTGGTAGGTTTAAAACCAACAAGAGGATTGGTGAGT 
CAT G AAAAG C C AG AT T C GT AT AGT AC AG C AGT T CAT T T T C CAT T AAC T AA 
GT CAT C TAG AG AC G CAG AAAC AT TAT T AAC T TAT C T AAAGAAAAG CG AT C 
AAACG C T AGT AT C AG T T AAT GAT T T AAAAT C T T T AC C AAT T G CT T AT AC T 
T T G AAAT C AC C AAT G G G AAC AGAAG T T AG T C AAG AT G C T AAAAAC G C T AT 
TAT G GAC AAC G T C AC AT T C T T AAGAAAAC AAG GAT T C AAAG T AAC AG AG A 
T AGAC T T AC C AAT T G AT GG T AGAG CAT T AAT G C G T GAT TAT T C AAC C T T G 
GCTATTGGCATGGGAgGAGCTTTTTCAACaATTGAAAAAGAcTTAaAAAA 
Ac AT G G T T T T AC T AAAG AAGAC GT T GAT C CT AT T AC T T GGG C AGT T CAT G 
T T AT T T AT C AAAAT T C AG AT AAG GC T G AAC T T AAGAAAT C T AT TAT GG AA 
G C C C AAAAAC AT AT G GAT GAT TAT CG T AAGG C AAT G GAG AAG C T T C AC AA 
GCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTCTAA 
AT AC AGAT C C AT AT GT AAC AG AGG AAGAT AAAAGAG C GAT T TAT AAT AT G 
GAAAACTTGAGCCAAGAAGAAAGAATTGCTCTCTTTAATCGCCAGTGGGA 
G C CT AT GT T GC GT AGAAC AC CT T T T AC AC AAAT T G C T AAT At GAC AG GAC 
TCCCAGCTATCAGTATCCCGACTTACTTATCTGAGTCTGGTTTACCCATA 
g GG AC g AT GT T AAT G G C AGG T G C AAACT AT GAT AT G GT AT T AAT T AAAT T 
TGC AACT TTCTTTGAAAAAC AT CATGGTTTTAATGTT AAAT GGC AAAGAA 
T AAT AG AT AAAG AAG T GAAAC CAT C T AC T GG C CT AAT AC AG C C TAG T AAC 
TCCCTCTT T AAAGC T CAT T CAT CAT T AGT AAAT T TAG AAG AAAAT T C AC A 
AGT T AC T C AAGT AT C T AT CT C T AAAAAAT GG AT G AAAT CGT C T GT T AAAA 
AT AAAC CAT C C GTAAT GG CAT AT C AAAAAG C A 

SEQ ID NO: 4811 
STRAIN 116 9NT 

AATAGTACTGAGACAAGTGCTTCAGTAGCTCCTACTACAAATACTATCGT 
T C AAAC T AAT GAC AGT AAT C C T AC C GC AAAAT T T G CAT C AGAAT C AGGAC 
AAT C T GTAAT AT GT C AAGT AAAAC CAG AT AAT TCTGCGGCGCT T AC AAC A 
G T T GAC AC G C C T CAT AT T T CAG C T C CAG AT GAT T T AAAAAC AACT C AAT C 
AAGTCCTGTCGTTGAGAGTACTTCTACTAAGTTAACTGAAGAGACATACA 
AAC AAAAAG AT GGT C AAG AT T T AG C C AAC AT GGT GAG AAGT GGT C AAGT T 

ACTAGTGAGGAACTCGTCAATATGGCATACGATATTATTGCTAAAGAAAA 
CCCTTCTT T AAAT G CAG T C AT T AC TACT AG AC G C C AAG AAG C CAT T G AAG 

AGGCTAGAAAACTTAAAGATACTAATCAGCCATTTTTAGGTGTTCCCTTG 
T T AGT C AAGGG GT T AG G G C AC AGT AT T AAAG G T GGT GAAAC C AAT AAT G G 
C T T GAT C TAT G CAG AT GG AAAAAT t a G C AC AT T T GAC AGT AG C T AT G T C A 
AAAAAT AT AAAG AT T T AG GAT T TAT TAT T T TAG GAC AAAC GAAC T T T C C A 
GAG TAT GGG'FGG C GTAAT AT AAC AG AT T C T AAAT TAT AC G GT C C AACG C A 
TAACCCTCGGAATCTTGCTCATAATGCTGGTGGCTCTTCTGGTGGAAGTG 
CAG CAG C CAT T GC T AG C GG r AT G AC G C C AAT T GC T AGC GG T AG T GAT G CT 
GGTGGTTCTATCCGtATTCCATCTTCTTGGACGGGCTTGGTAGGTTTAAA 
AC C AAC AAG AGG AT T G GT G AGT AAT G AAAAG C CAG AT T CGT AT AGT AC AG 
C AGT T CAT T T T C C AT T AAC T AAGT CAT C TAG AG AC G CAG AAAC AT TAT T A 
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ACT TAT CTAAAGAAAAGCGAT CAAACGCTAGT AT CAGTTAAT GAT TT AAA 
ATCTTTACCAATTGCTTATACTTTGAAATCACCAATGGGAACAGAAGTTA 
GTCAAGATGCTAAAAACGCTATTATGGACAACGTCACATTCTTAAGAAAA 
C AAGGAT T C AAAGT AAC AG AGAT AG ACT TAG C AAT T GAT G GT AG AG CAT T 
AATGCGTGATTATTCAACCTTGGCTATTGGCATGGGAGGAGCTTTTTCAA 
CAATTGAAAAAGACTTAAAAAAACATGGTTTTACTAAAGAAGACGTTGAT 
C C T AT T AC T T GGG C AGT T CAT G T T ATT TAT C AAAAT T C AGAT AAG GC T G A 
AC T T AAG AAAT CT AT T AT G G AAG C C C AAAAAC AT AT G G ATG ATT AT C GT A 
AGG C AAT G GAG AAG CT T C AC AAG C AAT T T C C TAT T T T CT TAT CGC C AACG 
ACCGCAAGTTTAGCCCCTCTAAATACAGAtCCATATGTAACAGAGGAAGA 
TAAAAGAGCGATTTATAATATGGAAAACTTGAGCCAAGAAGAAAGAATTG 
CTCTCTTTAATCGCCAGTGGGAGCCTATGTTGCGTAGAACACCTTTTACA 
CAAATTGCTAATATGACAGGACTCCCAGCTATCAGTATCCCGACTTACTT 
AT CT G AGT C T G GT T T AC C CAT AGG G AC GAT GT T AAT G G C AGGT G C AAACT 
AT GAT AT GG T ATT AAT T AAAT T T G C AAC T T T C T T T G AAAAAC AT CAT G GT 
T TT AAT GT T AAAT GG C AAAG AAT AAT AG AT AAAG AAG T G AAAC C AT C T AC 
T GG C C T AAT AC AGC CT ACT AAC T C C C T C T T T AAAGCT CAT T CAT CAT TAG 
T AAAT T TAG AAG AAAAT T CAC AAGTT ACT CAAGT AT CT AT CT CT AAAAAA 
T GG AT GAAAT C G T C T GT TAAAAAT AAAC CAT C C GT AAT GG CAT AT C AAAA 
AGC A 

SEQ ID NO: 4812 
STRAIN JM9130013 

TTCAGTAGCTCCTACTACAAATACTATCGTTCAAACTAATGACAGTAATC 
C T AC C G C AAAAT T T T CAT C AGAAT C AG GAC AAT CT GT AAT AG GT CAAGT A 
AAAC C AG C T AAT TCTGTGGCGCT T AC AAC AGT T G AC ACG C C T CAT AT T T C 
AG CT C C AG AT G C T TT AAAAAC AAC T C AAT C AAGT C C T GT CGT T GAG AGT C 
C T T CT ACT AAGT T AAC T GAAGAG AC AT AC AAAC AAAAAG AT GGT C AAG AG 
TTAGCCAACATGGT GAGAAGT GGT CAAGT TACT AGT GAGGAACT CGT CAA 
TATGGC AT ACGAT ATT ATTGCT AAAGAAAAC C CAT CT T T AAATGCAGTC A 
T TACT ACT AG AC GC C AAG AAG C T AT T GAAGAGG CT AG AAAACT T AAAG AT 
ACCAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGCA 
C AGT AT T AAAG GT G G T G AAAC C AAT AAT GG C T T GAT C T AT G C AG GT GG AA 
AAAT T AGC AC AT T T GAC AGT AG C T AT GT C AAAAAAT AT AAAG AT T T AG G A 
T T TAT TAT T T T AGG AC AAACG AAC T T T C C AGAGT AT GG AT G G C G C AAT AT 
AACAGATTCTAAATTATACGGTCCAACGCATAACCCTTGGAATCTTGCTC 
AT AAT GCT GGT GGCTCTTCTGGTGGAAGTGC AGC AGTT ATTGCT AGCGGG 
ATGACGCCAATTGCTAGCGGTAGTGATGCTGGTGGTTCTATCCGTATTCC 
ATCTT CTTGGACGGGCTTGGT AGGTT T AAAACCAACAAGAGGAT TGGTGA 
GTAATGAAAAGCCAGATTCGTATAGTACAGCAGTTCATTTTCCATTAACT 
AAGTCATCTAGAGACGCAGAAACATTATTAACTTATCTAAAGAAAAGCGA 
T C AAACG CT AGT AT C AGT T AAT GAT T T AAAAT CT T T AC C AAT T G C T TATA 
C T T T GAAAT CAC C AAT GG G AAC AGAAGT TAG T C AAGAT G CT AAAAAT GC T 
AT T AT GG AC AAC GT CAT AT T C T T AAG AAAAC AAGGAT T C AAAGT GAC AG A 
GATAGACTTACCAATTGATGGTAGAGCATTAATGCGTGATTATTCAACCT 
TGGCTATTGGTATGGGAGGAGCTTTTTCAACAATTGAAAAAGACTTAAAA 
AAAC AT GGT T T T ACT AAAGAAG AC GT T GAT C C C AT T AC T T G GGGAGT T C A 
T GT TAT T TAT C AAAAT T C AG AT AAGG C T G AACT T AAG AAAT CT AT TAT G G 
AAGCCCAAAAACATATGGATGATTATCGTAAGGCAATGGAGAAGCTTCAC 
AAGCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTCT 
AAATACAGATCCATATGTAACAGAGGAAGATAAAAGAGCGATTTATAATA 
TGGAAAACTTGAGCCAAGAAG AAAG AAT T GCT CTCTTT AAT CGCCAGTGG 
GAG C C T AT GT T G CGT AGAAC AC C T T T T AC AC AAAT T G C T AAT AT GAC AGG 
ACTCCCAGCTATCAGTATCCCGACTTACTTATCTGAGTCTGGTTTACCCA 
TAGGGACGATGTTAATGGCAGGTGCAAACTATGATATGGTATTAATTAAA 
TTTGCAACTTTCTTTGAAAAATATCATGGTTTTAATGTTAAATGGCAAAG 
AATAATAGATAAAGAAGTGAAACCATCTACTGGCCTAATACAGCCTACTA 
ACT C C C T CT T T AAAG C T CAT T CAT CAT T AGT AAAT T TAG AAG AAAAT T C A 
CAAGTT ACT CAAGT AT CT AT CT CTAAAAAAT GGAT GAAATCGT CTGTT AA 
AAATAAACCATCCGTAATGGCATAT 

SEQ ID NO: 4813 
STRAIN H36B 

CTTCAGTAGTTCCTACTACAAATACTATCGTTCAAACTAATGACAGTAAT 
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CCTACCGCAAAATTTTCATCAGAATCAGGACAATCTGTAATAGGTCAAGT 
AAAAC C AG CT AAT TCTGTGGC G CT T AC AAC AGT T GAG AC G C C T CAT ATT T 
CAGCTCCAGATGCTTTAAAAACAACTCAATCAAGTCCTGTCGTTGAGAGT 
CCTTCTACTAAGTTAACTGAAGAGACATACAAACAAAAAGATGGTCAAGA 
TTTAGCCAACATGGTGAGAAGTGGTCAAGTTACTAGTGAGGAACTCGTCA 
AT AT G GC AT a C GAT At TAT T GCT AAAGAAAACC C AT CT T T AAAT GC AGT C 
AT T AC T ACT AG ACG C C AAG AAG C TAT TG AAGAGG CT AG AAAAC T T AAAGA 
TACCAATCAGCCGTTTTTAGGTGTTCCCTTGTTAGTCAAGGGGTTAGGGC 
AC AGT AT T AAAGGT GGT G AAAC C AAT AAT G G CT T G AT CT AT GC AGGT G GA 
AAAAT T AGC AC AT T T G AC AG TAG C T AT G T C AAAAAAT AT AAAG AT T T AGG 
AT T TAT TAT T T T AGG AC AAAC G AAC T T T C C AG AGT AT GGAT G G CG C AAT A 
TAACAGATTCTAAATTATACGGTCCAACGCATAACCCTTGGAATCTTGCT 
CATAATGCTGGTGGCTCTTCTGGTGGAAGTGCAGCAGTTATTGCTAGCGG 
GATGACGCCAATTGCTAGCGGTAGTGATGCTGGTGGTTCTATCCGTATTC 
CAT CTTCTTGGACGGGCTT GGT AGGTTT AAAAC CAACAAGAGGATTGGTG 
AGTAATGAAAAGCCAGATTCGTATAGTACAGCAGTTCATTTTCCATTAAC 
T AAG T CAT C TAG AG ACG C AG AAAC AT T AT T AAC T TAT CT AAAGAAAAG CG 
AT C AAAC G CT AGT AT C AGT T AAT GAT T T AAAAT CT T TAG C AAT T G C T TAT 
ACTTTGAAATCACCAATGGGAACAGAAGTTAGTCAAGATGCTAAAAATGC 
T ATT AT GGACAACGT CATATT CT T AAGAAAACAAGGATT CAAAGTGACAG 
AGAT AGACT T AC C AAT T G AT GG TAG AGC AT T AAT G C GT GAT TAT T C AAC C 
T T GGC TAT T GGT AT GGG AGGAG C T T T T T CAAC AAT T GAAAAAGACT T AAA 
AAAAC AT GGT T T T AC T AAAG AAGAC GT T GAT C C CAT T AC T T GGGC AG T T C 
AT G T TAT T T AT CAAAAT T C AGAT AAGG C T GAAC T T AAGAAAT C T AT TAT G 
GAAGCCCAAAAACATATGGATGATTATCGTAAGGCAATGGAGAAGCTTCA 
CAAGCAATTTCCTATTTTCTTATCGCCAACGACCGCAAGTTTAGCCCCTC 
T AAATACAGAT CCAT ATGTAACAGAGGAAGATAAAAGAGCGAT T TAT AAT 
AT GG AAAAC T T G AG CC AAG AAG AAAG AAT TGCTCTCTT T AAT CGC C AGT G 
GGAGCCTATGTTGCGTAGAACACCTTTTACACAAATTGCTAATATGACAG 
GACTCCCAGCTATCAGTATCCCGACTTACTTATCTGAGTCTGGTTTACCC 
AT AG G G AC G AT GT T AAT G G C AG GT G C AAAC TAT GAT AT GG TAT T AAT T AA 

ATTTGCAACTTTCTTTGAAAAATATCATGGTTTTAATGTTAAATGGCAAA 
GAAT AAT AG AT AAAG AAG T G AAAC CAT C T AC T G G C C T AAT AC AG C C T ACT 
AAC TCCCTCTT T AAAG C T CAT T CAT CAT T AGT AAAT T T AG AAGAAAAT T C 
AC AAGT T AC T C AAGT AT C TAT C T C T AAAAAAT G G AT G AAAT CGTCTGTTA 
AAAATAAA 

SEQ ID NO: 4814 

STRAIN 2 603 frame: 1 

NSTETSASWPTTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAP 
DALKTTQSSPWESTSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPS 
LNAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFD 
SSYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGLTHNPWDLMNAGGSSGGSAAAIAS 
GMTPIASGSDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETL 
LTYLKKSDQTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFKVTEID 
LPIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELK 
KS IMEAQKHMDD YRKAMEKLHKQFP I FLSPTTAS LAPLNTDP YVTEE DKRAI YNMENLSQ 

EERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLI 

KFATFFEKHHGFNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISK 
KWMKS S VKNKPS VMAYQKA 

SEQ ID NO: 4815 

STRAIN _090 frame: 1 

NSTETSASVVPTTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAP 
DALKTTQS S P WE S T S TKLTEET YKQKDGKDL ANMVRS GQVT S EELVNMAYD 1 1 AKEN PS 
LNAVI TTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHS IKGGETNNGL I YADGKI ST FD 
SSYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIAS 
GMTPIASGSDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETL 
LTYLKKSDQTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFKVTEID 
LPIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELK 
KSIMEAQKHMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEE DKRAI YNMENLSQ 
EERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLI 
KFATFFEKHHGFNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISK 
KWMKS S VKNKPS VMAYQKA 
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SEQ ID NO: 4816 

STRAIN A909 frame: 2 

TTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAPDALKTTQSSPV 
VESTSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTRRQE 
AIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDSSYVKKYKDLG 
FIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIASGMTPIASGSDA 
GGS IRI PSSWTGLVGLKPTRGLVSNEKPDS YSTAVHFPLTKS SRDAETLLTYLKKS DQTL 
VS VNDLKSLPIAYTLKS PMGTEVSQDAKNAIMDNVT FLRKQGFKVTE I DLP.I DGRALMRD 
Y S T LAI GMGG AF S T I E KD LKKHG FT KE DVD PIT W AVH V I YQN S DKAE LKKS I ME AQKHM D 
DYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQEERIALFNRQW 
EPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEKHHG 
FNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSVKNKP 
SVMAYQKA 

SEQ ID NO: 4817 

STRAIN COH1 frame: 1 

NSTETSASVAPTTNTIVQTNDSNPTAKFASESGQSVIGQVKPANSAALTTVDTPHISAPD 
ALKT T Q S S P WE S P S T KLT E ET YKQK DG Q D L ANMVR S GQ VT SEE L VNMA Y D 1 1 AKEN P S L 
NAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDS 
SYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGPTHNPWNLAHNAGGSSGGSAAAIASG 
MTPIASGSDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLL 
TYLKKSDQTLVS VNDLKSLPIAYTLKS PMGTEVSQDAKNAIMDNVTFLRKQGFKVTEIDL 
PIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKK 
SIVEAQKHMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEKDKRAIYNMENLSQE 
ERIALFNRQWEPMLRRTPFTPIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIK 
FATFFEKHHGFNVKWQRIIDKEVKPSADLIQPTNSLFKAHSSLVNLEENSQVTQVSISKK 
WMKSSVKNKP SVMAYQKA 

SEQ ID NO: 4818 

STRAIN M732 frame: 1 

SVAPTTNTIVQTNDSNPTAKFASESGQSVIGQVKPANSAALTTVDTPHISAPDALKTTQS 
SPWESPSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTR 
RQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDSSYVKKYK 
DLGFIILGQTNFPEYGWRNITDSKLYGXTHNPWDLAHNAGGSSGGSAAAIASGMTPIASG 
SDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSD 
QTLVSVNDLKSLPIAYTLKS PMGTEVSQDAKNAIMDNVT FLRKQGFKVTE I DLPIDGRAL 
MR D Y S T LAI GMGG A F S T I E KD LKKHG FT KE D V DPI T W AVH V I YQN S D KAE LKK S I VE AQK 
HMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEKDKRAIYNMENLSQEERIALFN 
RQWEPMLRRTPFTPIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEK 
HHGFNVKWQRI I DKEVKPSADLIQPTNS LFKAHS SLVNLEENSQVTQVS I SKKWMKS SVK 
NKP SVMAYQKA 

SEQ ID NO: 4819 

STRAIN 18RS21 frame: 1 

NSTETSASWPTTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAP 
DALKTTQSSPWESTSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPS 
LNAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFD 
SSYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIAS 
GMTPI ASGS DAGGS IRI PS SWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKS SRDAETL 
LT YLKKS DQTLVS VNDLKS LP I AYT LKS PMGTEVSQDAKNAIMDNVT FLRKQGFKVTE I D 
LPIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELK 
KS IME AQKHMD D YRKAMEKLHKQFP I FL S PTT AS LAPLNT DPYVTEE DKRAI YNMENLS Q 
EERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLI 
KFATFFEKHHGFNVKWQRI I DKEVKPSTGLIQPTNSLFKAHS SLVNLEENSQVTQVS ISK 
KWMKS SVKNKP SVMAYQKA 

SEQ ID NO: 4820 

STRAIN M7 81 frame: 2 

ASVAPTTNTIVQTNDSNPTAKFASESGQSVIGQVKPANSAALTTVDTPHISAPDALKTTQ 
SSPWESPSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITT 
RRQEAIEE ARKLKDTNQP FLGVPLLVKGLGHS IKGGETNNGLI YADGKI ST FDS S YVKKY 
KDLGFIILGQTNFPEYGWRNITDSKLYGPTHNPWNLAHNAGGSSGGSAAAIASGMTPIAS 
GS DAGGS IRI PSSWTGLVGLKPTRGLVSNEKPDS YSTAVHFPLTKS SRDAETLLTYLKKS 
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DQTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLREQGFKVTEIDLPIDGRA 
LMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKKSIVEAQ 
KHMDDYRBCAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEKDKRAIYNMENLSQEERIALF 
NRQWEPMLRRTPFTPIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFE 
KHHGFNVKWQRIIDKEVKPSADLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSV 
KNKPSVMAYQKA 

SEQ ID NO: 4821 

STRAIN CJB110 frame: 3 

VPTTNTIVQTNDSNPTAKFVSESGQSVIGQVKPDNSAALTTVDTPHHISAPDALKTTQSS 
PWESTSTKLTEETYKQKDGKDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTRR 
QEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDSSYVKKYKD 
LGFIILGQTNFPEYGWRNITDSKLYGLTHNPWDLAHNAGGSSGGSAAAIASGMTPIASGS 
DAGGSIRIPSSWTGLVGLKPTRGLVSHEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSDQ 
TLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFKVTEIDLPIDGRALM 
RDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKKSIMEAQKH 
MDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQEERIALFNR 
QWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEKH 
HGFNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSVKN 
KPSVMAYQKA 

SEQ ID NO: 4822 

STRAIN 1169NT frame: 1 

NSTETSASVAPTTNTIVQTNDSNPTAKFASESGQSVICQVKPDNSAALTTVDTPHISAPD 
DLKTTQSSPWESTSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSL 
NAVITTRRQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYADGKISTFDS 
SYVKKYKDLGFIILGQTNFPEYGWRNITDSKLYGPTHNPRNLAHNAGGSSGGSAAAIASG 
MTPIASGSDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLL 
TYLKKSDQTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVTFLRKQGFKVTEIDL 
PIDGRALMRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKK 
SIMEAQKHMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQE 
ERIALFNRQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIK 
FATFFEKHHGFNVKWQRI I DKEVKPSTGLIQPTNSLFKAHS SLVNLEENSQVTQVS I SKK 
WMKSSVKNKPSVMAYQKA 

SEQ ID NO: 4823 

STRAIN JM9130013 frame: 2 

SVAPTTNTIVQTNDSNPTAKFSSESGQSVIGQVKPANSVALTTVDTPHISAPDALKTTQS 
SPWESPSTKLTEETYKQKDGQELANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTR 
RQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYAGGKISTFDSSYVKKYK 
DLGFIILGQTNFPEYGWRNITDSKLYGPTHNPWNLAHNAGGSSGGSAAVIASGMTPIASG 
SDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSD 
QTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVIFLRKQGFKVTEIDLPIDGRAL 
MRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWGVHVIYQNSDKAELKKSIMEAQK 
HMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQEERIALFN 
RQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEK 
YHGFNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSVK 
NKPSVMAY 

SEQ ID NO: 4824 

STRAIN H36B frame: 3 

SVVPTTNTIVQTNDSNPTAKFSSESGQSVIGQVKPANSVALTTVDTPHISAPDALKTTQS 
SPVVESPSTKLTEETYKQKDGQDLANMVRSGQVTSEELVNMAYDIIAKENPSLNAVITTR 
RQEAIEEARKLKDTNQPFLGVPLLVKGLGHSIKGGETNNGLIYAGGKISTFDSSYVKKYK 
DLGFIILGQTNFPEYGWRNITDSKLYGPTHNPWNLAHNAGGSSGGSAAVIASGMTPIASG 
SDAGGSIRIPSSWTGLVGLKPTRGLVSNEKPDSYSTAVHFPLTKSSRDAETLLTYLKKSD 
QTLVSVNDLKSLPIAYTLKSPMGTEVSQDAKNAIMDNVIFLRKQGFKVTEIDLPIDGRAL 
MRDYSTLAIGMGGAFSTIEKDLKKHGFTKEDVDPITWAVHVIYQNSDKAELKKSIMEAQK 
HMDDYRKAMEKLHKQFPIFLSPTTASLAPLNTDPYVTEEDKRAIYNMENLSQEERIALFN 
RQWEPMLRRTPFTQIANMTGLPAISIPTYLSESGLPIGTMLMAGANYDMVLIKFATFFEK 
YHGFNVKWQRIIDKEVKPSTGLIQPTNSLFKAHSSLVNLEENSQVTQVSISKKWMKSSVK 
NK 

SEQ ID NO: 4901 
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STRAIN 2603 

aaacatccgatacttaatgatcaaaaatccttagcaattgttgaacagat 
agaatatgattttgataaattcgataattcagaagcttctttttatgcaa 
cattagctagawttcgcgttatggatagagaaatcaaaaaatttattaga 
gaaaatccaaatagtcaaatcctttcaattggttgtggacttgatacaag 
gtttgaaagagtcgataatggacaaattaggtggtataaccttgatttgc 
cagaggttatggagataagaaaattattttttgaagagcatgaaagagtt 
actaatatagcaaaatcagccctagatgaaacttggacacgggaggtaaa 
tccccaaaatgccccttttctaatcgtgtcagaaggtgttttaatgtttc 
taaaagaagatgacgtagagacttttcttcatatcctgacaaattcattt 
agccaatttatggcacaatttgatttgtgtcataaggaaatgattaataa 
aggaaagcaacatgatacagtaaagtatatggatacagaatttcagtttg 
gtatcacagatggtcatgagattgtggatttagaccctaaattaaagcaa 
ataaatctgattaactttacagatgagatgagcaaatttgagttaggcac 
acttcgctctttacttccaacaattcgtaaatttaataattgtttaggtg 
tgtacgaatataaagcatc 

SEQ ID NO: 4902 
STRAIN 090 

T AAT GAT C AAAAAT C C T T AG C AAT T G T T G AAC AG AT AG AAT AT GAT T T T G 
AT AAAT T C G AT AAT T C AGAAG C T T CT T T T T AT G C AAC AT TAG C T AGAAT T 
C G C GT T AT GG AT AG AG AAAT C AAAAAAT T TAT T AG AGAAAAT C C AAAT AG 
TCAAATCCTTTCAATTGGTTGTGGACTTGATACAAGGTTTGAAAGAGTCG 
AT AAT G G AC AAAT T AGGT G G T AT AAC CT T GAT T T G C C AG Ag GT T AT GGAG 
ATAAGAAAATTATTTTTTGAAGAGCATGAAAGAGTTACTAATATAGCAAA 
AT C AG C CAT AG AT G AAAC T T GGAC AC G GGAG GT AAAT C C C C AAAAT G C C C 
CTTTTCTAATCGTGTCAGAAGGTGTTTTAATGTTTCTAAAAGAAGATGAC 
GTAGAGACTTTTCTTCATATCCTGACAAATTCATTTAGCCAATTTATGGC 
AC AAT T T GAT T T GT GT C AT AAGG AAAT GAT T AAT AAAG G AAAGC AAC AT G 
AT AC AGT AAAG T AT AT GG AT AC AG AAT T T C AG T T T G GT AT C AC AG AT GGT 
CATGAGATTGTGGATTTAGACCCTAAATTAAAGC AAAT AAAT CT GAT TAA 
CT T T AC AGAT GAG AT GAG C AAAT T T GAG T T AGG C AC AC TTCGCTCT T T AC 
T T C C AAC AAT T C GT AAAT T T AAT AAT T GT T T AGGT G T GT AC G AAT AT AAA 
GCATC 

SEQ ID NO: 4903 
STRAIN A909 

AAAC AT C CGAT AC T T AAT G A 

T C AAAAAT C C T TAG C AAT T GT T G AAC AG AT AGAAT AT GAT T T T GAT AAAT 
T C GAT AAT T C AG AAGC T T C T T T T TAT GC AAC AT T AGC T AGAAT T C GC GT T 
AT GGAT AG AGAAAT C AAAAAAT T TAT TAG AGAAAAT C C AAAT AGT C AAAT 
CcTTTCaATTGGTTGTGGACTTGATACAAGGTTTGAAAGAGTCGATAATG 
G AC AAAT T AGG T G GT AT AAC C T T GAT T T G C C AG AGGT TAT G G AG AT AAG A 
AAAT TaTTTTTT G AAGAG C AT G AAAG AG T T AC T AAT AT AGC AAAAT C AG C 
CCTAGATGaAACTTGGACACGGGAGGTAAATCCCCAAAATGCCCCTTTTC 
T AAT C GT GT C AG AAG G T G T T T T AAT G T T t C T AAAAG AAG AT GAC GT AG AG 
AC T T T T c T T CAT AT C C T G AC AAAT T CAT T TAG C C AAT T T AT GG C AC AAT T 
T GAT T T GT GT CAT AAG G AAAT GAT T AAT AAAGG AAAG C AAC AT GAT AC AG 
T AAAG TAT AT G GAT AC AGAAT T T C AG T T T GGT AT C AC AG AT GGT CAT GAG 
AT T G T GGAT T T AGAC C C T AAAT T AAAG C AAAT AAAT C T GAT T AAC T T T AC 
AGATGAGATGAGCAAATTTGAGTTAGGCACACTTCGCTCTTTACTTCCAA 
CAATT CGT AAAT T T AAT AATT GT T T AGGT GT GTACGAAT AT AAAGCAT C 

SEQ ID NO: 4904 
STRAIN H36B 

AAACATCCGATACTTAATGATCAAAAATCCTTAGCA 

AT T G T T G AAC AG AT AGAAT AT GAT T T T GAT AAAT T C GAT AAT T C AG AAG C 
TTCTTTTTATGCAaCATTAGCTAGAATTCGCGTTATGGATAGAGAAATCA 
AAAAATT T ATT AGAGAAAAT C C AAAT AGT CAT AT CCT TT CAAT T GGCTGT 
G g ACT T G AT AC AAG GT T T G AAAG AG T C GAT AAT G GAC AAAT TAG GT GGT A 
TAACCTTGATTTGCCAGAGGTTATGGAGATAAGAAAATTATTTTTTGAAG 
AGCATGAAAGAGTTACTAATATAGCAAAATCAGCCcTAGATGAAACTTGG 
ACACGGGAGGTAAATCCCCAAAATGCCCCTTTTCTAATCGTGTCAGAAGG 
T GT T T T AAT G T T T C T AAAAG AAG AT G ACGT AG AG AC T T T T C T T CAT AT C C 
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T GAC AAAT T C AT TT AG CG AAT T TAT GG C AC AAT T T GAT T T GT GT C AgAAG 
GAAAT GAT T AAT AAAGG AAAG C AAC AT GAT AC AGT AAAGT AT AT G GAT AC 
AGAATTTCAGTTGGGTATCACAGATGGTCATGAAATTGTGGATTTAGACC 
C T AAAT T AAAG C AAAT AAAT C T GAT T AAC T T T AC AG AT GAG AT GAG C AAA 
TTTGAGTTAGGCACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAA 
T AAT T GT T T AG GT GT G T AC GAAT AT AAAGC AT C 

SEQ ID NO: 4905 
STRAIN 18RS21 

AACAT C C GAT AC T T AAT GAT C AAAAAT C C T T AG C AAT 
T GT T G AAC AG AT AG AAT AT GAT T T T GAT AAAT T C GAT AAT T C AG AAG C TT 
C T T T T T AT G C AAC AT TAG C T AGAAT T C G C GT T AT GGAT AGAGAAAT C AAA 
AAATTTATTAGAGAAAATCCAAATAGTCaAATCCTTTCAATTGGTTGTGG 
ACTTGATACAAGGTTTGAAAGAGTCGATAATGGACAAATTAGGTGGTATA 
AC CT T GAT T T G C C AG AGGT T AT G G AGAT AAG AAAAT T ATT T T T T GAAG AG 
CAT G AAAG AGT TAG T AAT AT AGC AAAAT C AGC C C TAG AT G AAAC T T G GAC 
AC GGG AG GT AAAT C C C C AAAAT GCCCCTTTT CT AAT C GT GT C Ag AAGGT G 
T T T T AAT GT T T CT AAAAGAAG AT G AC GT AGAG ACT T T T CT T CAT AT C C T G 
AC AAAT T CAT T T AG C C AAT T T AT GG C AC a AT T T GAT T T GT G T C AT Aa GG A 
AAT GAT T AAT AAAGGAAAG C AAC AT GAT AC AG T AAAGT AT AT GGAT AC AG 
AAT T T C AGT T T GGT AT C AC AG AT GG T CAT GAGAT T GT GGAT T T AGAC C C T 
AAAT T AAAG C AAAT AAAT C T GAT T AAC T T T AC AG AT GAG AT GAG C AAAT T 
TGAGTTAGGCACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAATA 
ATTGTTTAGGTGTGTACGAAtATAaaGCATC 

SEQ ID NO: 4906 
STRAIN M732 

AAAC AT C C GAT AC T T AAT GAT C AAAAAT C C T T AG C AAT T GT T G AAC A 
GAT AGAAT AT GAT T T GGAT AAAT T CG AT AAT T C AGAAG CTTCTTTTT AT G 
C AAC AT TAG CT AGAAT T C G C G T TAT GGAT AGAG AAAT C AAAAAAT T TAT T 
AGAG AAAAT C C AAAT AGT C AAAT C CT T T C AAT T GGT T G T GG AC T T GAT AC 
AAG GT T T G AAAG AGT C GAT AAT GG AC AAAT T AG GT GG T AT AAC CT T GAT T 
T G C C AG AGGT TAT GG AG AT AAG AAAAT TAT T T T T T GAAG AG CAT G AAAG A 
GT T AC T AAT AT AG C AAAAT C AG C C CT AG AT GAAAC T T GG AC AC G GGAGGT 
AAATCCCCAAAATGCCCCTTTTCTAATCGTGTCAGAAGGTGTTTTAATGT 
T T C T AAAAg AAG AT G AC G TAG AGAC T T T T C T T C At AT C C T GAC AAAT T C A 
T T T AG CC AAT T TAT GG C a C AAT T T GAT T T GT GT C AT AAGG AAAT G AT T AA 
T AAAG G AAAG C AAC AT GAT AC AG T AAAGT AT AT G GAT AC AGAAT T T C AGT 
T T G GT AT C AC AG AT GGT CAT GAGAT T GT GG AT T TAG AC C C T AAAT T AAAG 
C AAAT AAAT C T GAT T AAC T T T AC AG AT GAGAT GAG C AAAT T T GAG T T AgG 

CACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAATAATTGTTTAG 
G t GTGTACGAATAT AAAGC AT C 

SEQ ID NO: 4907 
STRAIN COH1 

AAAC AT C CGAT ACT T AAT GAT C AAAAAT C C T T AG C AA 
T T G T T G AAC AG AT AG AAT AT GAT T T G GAT AAAT T C GAT AAT T C AGAAG CT 
T CT T T T T AT GC AAC AT TAG C TAG AAT T CG C GT TAT GGAT AG AG AAAT C AA 
AAAAT T TAT TAG AG AAAAT C C AAAT AGT C AAAT C C T T T C AAT TGGTTGTG 
G AC T T GAT AC AAGGT T T G AAAG AGT C GAT AAT GG AC AAAT T AGG T G G T AT 
AAC CT T GAT T T G C C AG AGGT TAT G G AG AT AAGAAAAT TAT T T T T T GAAG A 
G C AT G AAAG AGT T AC T AAT AT AGC AAAAT C AG C C C TAG AT G AAACT T GG A 
CACGGGAGGTAAATCCCCAAAATGCCCCTTTTCTAATCGTGTCAGAAGGT 
GTTTTAATGTTTCTAAAAGAAGATGACGTAGAGACTTTTCTTCATATCCT 
GAC AAAT T CAT T TAG C C AAT T TAT GG C AC AAT T T GAT T T G T GT CAT AAG G 
AAAT GAT T AAT AAAGGAAAG CAACATGATACAGT AAAGT AT AT GGAT AC A 
GAAT T T C AGT T T G GT AT C AC AGAT GGT CAT GAGAT T G T G GAT T TAG AC C C 
T AAAT T AAAG C AAAT AAAT C T G AT T AAC T T T AC AG AT GAGAT G AGC AAAT 
TTGAGTTAGGCACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAAT 
AAT T G T T T AGG T G T G T AC GAAT AT AAAG CAT C 

SEQ ID NO: 4908 
STRAIN M781 

AAAC AT C C GAT AC T T AAT G AT C A 
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AAAAT C C T TAG C AAT T GT T GAAC AG AT AG AAT AT GAT T T GGAT AAAT T C G 
AT AAT T C AGAAGC T T CT T T T TAT G C AAC AT TAG CT AG AAT T C G C GT TAT G 
G AT AGAGAAAT C AAAAAAT T T AT T AG AGAAAAT C C AAAT AGT C AAAT C C T 
TTCAATTGGTTGTGGACTTGATACAAGGTTTGAAAGAGTCGATAATGGAC 
AAAT TAG GT GGT AT AAC C T T GAT T T GC C AG AGGT TAT G GAGAT AAGAAAA 
T TAT T T T T T G AAG AG CAT G AAAG AGT T AC T AAT AT AG C AAAAT C AG C C C T 
AG AT GAAAC T T GG AC ACGG GAG GT AAAT CC C C AAAAT GCCCCTTTTC T AA 
T CGT GT C AG AAGGT GT T T T AAT G T T T C T AAAAg AAGAT G ACGT AG AG AC T 
T T T C T T CAT AT C C T G AC AAAT t CAT T T AGC C AAT T T A t G G C AC AAT T T G A 
TTTGTGTCATAAGGAAATGATTAATAAAGGAAAGCAACATGATACAGTAA 
AGT AT ATGGATACAGAATT T CAGTT TGGTAT CACAGATGGT CAT GAGATT 
GTGG AT T T Ag AC C C T AAAT T AAAG C AAAT AAAT C T GAT T AACT T T AC AG A 
TGAGATGAGCAAATTTGAGTTAGGCACACTTCGCTCTTTACTTCCAACAA 
TTCGTAAATTTAATAATtGTTTAGGTGTGTACGAATATAAAGCATC 

SEQ ID NO: 4909 
STRAIN CJB110 

AAAC AT C CG AT AC T T AAT G AT C AAAAAT C C T TAG C AA 
T T GT T GAAC AG AT AG AAT AT GATT T T GAT AAAT T C GAT AAT T C AG AAG C T 
T C T T T T TAT GC AAC AT T AG CT AGAAT T CG C GT TAT GGAT AG AG AAAT C AA 
AAAATTTATTAGAGAAAATCCAAATAGTCAAATCCTTTCAATTGGTTGTG 
GAC T T G AT AC AAGG T T T G AAAG AGT C GAT AAT G G AC AAAT T AGG T G GT AT 
AAC C T T GAT T T G C C AG AGGT TAT G GAGAT AAG AAAAT TAT T T T T T G AAGA 
GCAT G AAAGAGT T AC T AAT AT AG C AAAAT C AG C CAT AG AT GAAAC T T GG A 
CACGGGAGGTAAATCCCCAAAATGCCCCTTTTCTAATCGTGTCAGAAGGT 
GTT T T AAT G TT TCT AAAAG AAG AT G ACGT AGAG ACT TTTCTTC AT ATCCT 
GAC AAAT T CAT T T AG C C AAT T T AT G G C AC AAT T T GAT T T GT GT C AT AAG G 
AAAT GAT T AAT AAAGG AAAG C AAC AT GAT AC AGT AAAG TAT AT GGAT AC A 
G AAT T T C AGT T T GGT AT C AC AGAT GGT CAT GAGAT T GT GGAT T TAG AC C C 
T AAAT T AAAG C AAAT AAAT C T G AT T AACT T T AC AG AT GAGAT GAG C AAAT 
T T GAG T T AGG C AC AC TTCGCTCTT T AC T T C C AAC AAT T C GT AAAT T T AAT 
AAT T GT T T AGG T G T GT AC G AAT AT AAAG CAT C 

SEQ ID NO: 4910 
STRAIN 1169NT 

AAAC AT C C GAT AC T T AAT GAT C AAAAAT C C T T AG C AAT 
T GT T GAAC AGAT AGAAT AT GAT T T T GAT AAAT T C GAT AAT T C AG AAG CT T 
CT T T T TAT GC AAC AT TAG C T AGAAT T C G C G T TAT G GAT AG AG AAAT C AAA 
AAAT T TAT TAG AG AAAAT C C AAAT AGT CAT AT C C T T T C T AT TGGTTGTGG 
AC T T GAT AC AAGGT T T G AAAG AGT C GAT AAT GGAC AAAT T AG GT GGT AT A 
AC CT T GAT T T G C C AGAGG TT AT GG AGAT AAG AAAAT TAT T T T T T G AAG AG 
CAT G AAAG AGT TACT AAT AT AG C AAAAT C AG C C C T AGAT GAAAC T T GGAC 
ACAGGAGGTAAATCCCCAAAATGCCCCTTTTCTGATCGTGTCAGAAGGTG 
TTTTAATGTTTCTAAAAGAAGATGACGTAGAGACTTTTcTTCATATCCTG 
AC AAAT T CAT T TAG C C AAT T T AT G G C AC AAT T T GAT T T G T G t C AG AAGG A 
AAT GAT T AAT AAAG G AAAG C AAC AT G AT AC AGT AAAGT AT AT G GAT AC AG 
AATTTCAGTTTGGTATCACAGATGGTCATGAAATTGTGGATTTAGACCCT 
AAAT T AAAG C AAAT AAAT C T GAT T AAC T T TAG AG AT GAGAT GAG C AAAT T 
TGAGTTAGGCACACTTCGCTCTTTACTTCCAACAATTCGTAAATTTAATA 
AT T GT T TAG GTGTGTAC GAAT AT AAAGC AT C 

SEQ ID NO: 491X 
STRAIN JM9130013 

AGC AAT T GTT GAAC AGAT AGAAT AT GATT 

T T GAT AAAT T C GAT AAT T C AG AAG CTTCTTTT T ATG C AAC AT TAG C TAG A 
AT T CG C GT T AT G GAT AG AG AAAT C AAAAAAT T TAT TAG AG AAAAT C C AAA 
TAGTCATATCCTTTCAATTGGCTGTGGACTTGATACAAGGTTTGAAAGAG 
T CGAT AAT G GAC AAAT T AGGT G G TAT AAC C T T GAT T T G C C AG AG GT TAT G 
GAG AT AAG AAAAT TAT T T T T T G AAG AG CAT G AAAG AGT T AC T AAT AT AGC 
AAAAT C AGC CCT AG AT GAAACTTGG AC ACGGG AGGT AAAT CCCC AAAAT G 
CCCCTTTTCTAATCGTGTCAGAAGGTGTTTTAATGTTTCTAAAAGAAGAT 
G AC GT AG AG AC T T T T C T T CAT AT C CT GAC AAAT T CAT T T AG C C AAT T TAT 
G G C AC AAT T T GAT T T GT GT CAg AAGG AAAT GAT T AAT AAAGG AAAG C AAC 
AT GAT AC AGT AAAGT AT AT GGAT AC AG AAT T T C AGT T T G G TAT C AC AG AT 



197 



WO 2004/018646 



SEQUENCE LISTING 



GGT C AT GAAAT T GT GGAT T T AGAC C C T AAAT T AAAG C AAAT AAAT C T GAT 
T AACT T T AC AGAT GAG AT GAG C AAAT T T G AGT T AGGC AC ACT TCGCTCTT 
T AC T T C C AAC AAT T CGT AAAT T T AAT AAT T GT T T AGGT GT GT ACGAAT AT 
AAAG CATC 

SEQ ID NO: 4912 

STRAIN 2 603 frame: 1 

KHPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARXRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4913 

STRAIN 0 90 frame: 2 

NDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSIGCGLD 
TRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSAIDETWTREVNPQNAPFLI 
VSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTEFQFGI 
TDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4914 

STRAIN A909 frame: 1 

KHPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4915 

STRAIN H3 6B frame: 1 

KHPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSHILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCQKEMINKGKQHDTVKYMDTE 
FQLGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 



SEQ ID NO: 4916 

STRAIN 18RS21 frame: 3 

HPILNDQKSLAIVEQIEYDFDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSIG 
CGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQNA 
PFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTEF 
QFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4 917 

STRAIN M732 frame: 1 

KHPILNDQKSLAIVEQIEYDLDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4918 

STRAIN COH1 frame: 1 

KHPILNDQKSLAIVEQIEYDLDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4919 

STRAIN M7 81 frame: 1 

KHPILNDQKSLAIVEQIEYDLDKFDNSEAS FYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4920 

STRAIN CJB110 frame: 1 
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KHPILNDQKSLAIVEQIEYDFDKFDNSEASFYATLARIRVMDREIKKFIRENPNSQILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSAIDETWTREVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4921 

STRAIN 1169NT frame: 1 

KHPILNDQKSLAIVEQIEYDFDKFDNSEASFYATLARIRVMDREIKKFIRENPNSHILSI 
GCGLDTRFERVDNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTQEVNPQN 
APFLIVSEGVLMFLKEDDVETFLHILTNSFSQFMAQFDLCQKEMINKGKQHDTVKYMDTE 
FQFGITDGHEIVDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO: 4922 

STRAIN JM9130013 frame: 2 v 

AIVEQIEYDFDKFDNSEASFYATLARIRVMDREIKKFIRENPNSHILSIGCGLDTRFERV 
DNGQIRWYNLDLPEVMEIRKLFFEEHERVTNIAKSALDETWTREVNPQNAPFLIVSEGVL 
MFLKE DDVET FLHI LTN S FS QFMAQFDLCQKEMINKGKQHDT VKYMDTE FQFGIT DGHE I 
VDLDPKLKQINLINFTDEMSKFELGTLRSLLPTIRKFNNCLGVYEYKA 

SEQ ID NO. 5001 
STRAIN 2603 

AT G AAAAAAC AAAAAC TAT TAG T G C T T AT T GGAG G C T TAT T AAT AAT GAT AAT GAT GAC A 
GC AT GT AAG GAT T C AAAAAT CC C AG AAAAC CG C AC AAAGGAAG AG T AC C AAG C T G AAC AA 
AAT T T T AAAC CGT T T T T T G AGT T T T T AG C AC AAAAAGAT AAAG AT T T GAGC AAAAT ACAA 
AAAT AC T TAG TAT TAG TAT C GG AT T C AGG T GAT G CAT T AGAT T TAG AAT AT T T C T AT AGT 
AT T C AAGAT T T AAAAAAAAAT AAGGAT T T AG GG AAGT T T G AAAC AAGAAAAAGT C AAAT A 
G AAAAG CCGGGTGGC TAT AAT GAG T T AGAAAAT AAAG AG GT CC CAT TT G AAT AT T T T AAA 
AAT AAT AT AGT T TAT C C AAAAGGAAAAC C GAAT AT T AC AT T T GAT GAC T T TAT TAT C G G A 
G CAAT G GAT AC T AAAGAAT T AAAAG AAT T AAAAAAAT T AAAAG T AAAAAGT TAT T T AT T A 
AAAC AT C C G G AAAC T G AGT T G AAAGAT AT AAC AT AT GAAT T G C CG AC AC AG T C G AAG CT T 
ATTAAAAAA 

SEQ ID NO. 5002 

STRAIN 090 

T AAGGAT T C AAAAAT C C C AGAAAAC C G C AC AAAG 

GAAG AGT AC C AAG C T G AAC AAAAT T T T AAAC TGTTTTTT GAGT T T T T AGC 
AC AAAAAT AT AAAGAT TT G AAC AAAAT AC AAAAAT AC T T AC TAT TAG TAT 
C GGAT T C AG GT G AT G CAT TAG AT T TAG AAT AT T T CT AT AGT AT T C AAG AT 
T T AAAAAAAAAT AAGGAT T T AG GG AAGT T T G AAAC AAG AAAAAGT C AAAT 
AGAAAAG C CG G G T GG CT AT AAT GAGT TAG AAAAT AAAGAG GT C C CAT T T G 
AAT AT T T T AAAAAT AAT AT AGT T TAT C C AAAAG G AAAAC C GAAT AT T AC A 
T T T GAT GAC T T TAT TAT C G GAG CAAT G GAT AC T AAAGAAT T AAAAAAAT T 
AAAAGT AAAAAGT TAT T TAT T AAAAC AT C C GG AAAC T GAG T T GAAAG AT A 
T AAC AT AT GAAT T G C C GAC AC AG T C G AAGCT T AT T AAAAAA 

SEQ ID NO. 5003 

STRAIN 18RS21 

T AAG GAT T C AAAAAT C C C AG AAAAC C G C AC AAAG GAAG 
AGTACCAAGCTGAACAAAATTTTAAACCGTTTTTTGAGTTTTTAGCACAA 
AAAG AT AAAGAT T T G AG C AAAAT AC AAAAAT AC T TAG TAT T AGT AT C GG A 
TTCAGGTGATGCATTAGATTTAGAATATTTCTATAGTATTCAAGATTTAA 
AAAAAAAT AAG GAT T T AGG G AAGT T T GAAAC AAGAAAAAGT C AAAT AG AA 
AAG C CG GG T GG C TAT AAT G AGT T AGAAAAT AAAG AGG T C C C AT T T GAAT A 
TTTTAAAAATAATATAGTTTATCCAAAAGGAAAACCGAATATTACATTTG 
AT GAC T T TAT TAT C G GAG CAAT G GAT AC T AAAG AAT T AAAAG AAT TAAAA 
GAAT T AAAAAAAT T AAAAGT AAAAAGT TAT T TAT T AAAAC AT C C GG AAAC 
T G AGT T GAAAG AT AT AAC AT AT GAAT T GC CG G C AC AG T C G AAG CT T AT T A 
AAAAA 

SEQ ID NO. 5004 

STRAIN 2 603 frame: 1 

MKKQKLLLLIGGLLIMIMMTACKDSKIPENRTKEEYQAEQNFKPFFEFLAQKDKDLSKIQ 

KYLLLVSDSGDALDLEYFYSIQDLKPCNKDLGKFETRKSQIEKPGGYNELENKEVPFEYFK 

NNIVYPKGKPNITFDDFIIGAMDTKELKELKKLKVKSYLLKHPETELKDITYELPTQSKL 
IKK 
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SEQ ID NO. 5005 

STRAIN 090 frame: 2 

KDSKIPENRTKEEYQAEQNFKLFFEFLAQKYKDLNKIQKYLLLVSDSGDALDLEYFYSIQ 
DLKKNKDLGKFETRKSQIEKPGGYNELENKEVPFEYFKNNIVYPKGKPNITFDDFIIGAM 
DTKELKKLKVKSYLLKHPETELKDITYELPTQSKLIKK 

SEQ ID NO. 5006 

STRAIN 18RS21 frame: 2 

KDSKIPENRTKEEYQAEQNFKPFFE FLAQKDKDLSKIQKYLLLVSDSGDALDLEYFYSIQ 
DLKKNKDLGKFETRKSQIEKPGGYNELENKEVPFEYFKNNIVYPKGKPNITFDDFIIGAM 
DTKELKELKELKKLKVKSYLLKHPETELKDITYELPAQSKLIKK 

SEQ ID NO. 5101 
STRAIN 2603 

ttgaataataaaggtgtcggtggcgatggtgtccaaatttatcaatacta 
tatcaaaatggacaacaataaaccttacttaagtcccaaagataagacta 
ctgtagagaagttagaagatcgctggaaaaaaattactttcaaagttcag 
gatactggcattggtttgaaagacgtttatcttcaatctgttaagtatgt 
tggtggtggcaataataatttagaccttatcacacctccaggatttaaaa 
aagaagataaaaaagttgaaaaaccaaaattagaccgtccaccaggaatt 
gatttaccagcaccaacttcaatgagaagttttgattattcaaccccacc 
gggaactaagccaagcaaacccaaagatagtttatcaactcctccaggtt 
tcccagatttaaacacgccgccggatgaagcaccaaaggatagtaaaaaa 
gacgctattgaagataaatcaggagcaattaaatatgctaagtctcttca 
acttagctttgttgatggccctattttagctagcaaagtaaatggcaaaa 
tattacaagtcgaatctgatggcaaattagtcattcctagaaatgctttg 
tcagctaatcaatttgatgacactagtcttaaaatttatcgtaataataa 
tcgcaataaagaaattactatcacaacagattattttgcagatacaaaat 
atgtcaatatcacagcggttgactatttgagcaatactacttttgagcaa 
ttagctactggtgaaacagtagattaccatgccattgtattttcaagctt 
tgctgctattaaagacaagggtggtaagatttatgttaacgataaattgc 
aagaaacttctcgtatagcgcttaaagataaatctgttaagattggtatt 
gaattaccaaatgatgtcagacatattgatagtttatctgttcgtcgttt 
gaatgaggttaaaactgttgataatatcttgaaaaatgatgaacaagaca 
ttaatctcagcaaaacttaccaattaaaatacaacccgacaaatcgtcgt 
ctagagtttactattaataacattaactcaagttcagaaatcatgaccac 
tttcaaagatggaaagatgccagaattggttgaacaaaaagatgtttctt 
tggatataaacgatatggacatgagtaagtttaaaactattcgacttgga 
cgaaaggattctgaatttaagggacaacttattgcaaaaactggaacagt 
tgaattagatatgtttttcaaacaatctcaagacccagcttcaattatta 
aaaaaatataccttatccaaaatggtgttccaaatgaattgaaaaaattt 
gactctagttttggtttaactgaaagtcagatagatggatactatattta 
taaagatgcaattaaccttaaatttaaattaaccagtggtgcaagtctta 
aagttgtttataaagggcaagaagatccatatagtcatcagaaagaagat 
atgactaaaaaaggtgaacagctcagtcattcaactcaagccaatgaaaa 
tacagcaaaagtaacctttgctaatattgactggtcacattatagtaagg 
ttactgtgaatggaaaagaagttgttaaaggtagtgagttacctttaact 
aaaggatggacaacatttgtattacataaaacagaaaattcattaaatgt 
taaaagtttgattatggagacgggtagtgtaagtaagaaagttcaacaac 
ttcctttaagtcctagattatctaaaaataagcatatgagggatatgcta 
cttactatgcaaaaagattcagcgtattacgaaacaagtgacagtctagt 
ccttcgaattaatctcactgcagatactaaacttaattttaatgctgtta 
aaggagcgagtgctcttactgaaaatatgatgatgagacagtttgcagtt 
gctggaccacaagatgatcctgttagtgaacataaatacccatcagtatt 
tctcttaactcctgccttattggaaactgctagtgaggcaactctaaatg 
gtaaggaaatcacagcatctggtattatcggtcacatcaaggatggtgat 
aaaagcaagcatgttgaagtcaaaatggtgaatgaaaatggagacatgct 
aggaacccctgttattattcaaggtaaagacttgactaatcgaacaaaac 
< cattaatgagtggacgtagagtactttatgccggtaaacaatatgagttc 
cgggctaaattaccacttagtcgttttaacacttggattagggttgaagt 
ggtaacagaagcaggagagaaagcaagtattgttcgtcgcatgttctttg 
accaatcagttccagagcttaacacagcagttgctaaacgtgatttgact 
tctgatactgctcttatccacatcgttgccaaagatgactctctaaaact 
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aaaattatatcaagatgattcattacttgaatctgttgataaaaccggtc 
tttatagttttagaaatggtgtagaaatcactaaagatatgacagtacca 
ctagaatttggagataatattattaagttatctgctgttgacttatcaaa 
ttatcgtcgtaatgagacccttcatatctatagaaaccgttttgatgtta 
aagcaagccaaatgacagctgacaaaggagctaaagtaactgtggatatg 
ttgatgaagcacttagttgttccagaaatggcaggagcttatacattaac 
aatcgacgaagctccaaacacaaatgaatcaggaatgttaacaaacgcta 
aagtatcgattcattatgtaaatggtggtgttgataaagttgatgttccg 
attaaagtagttgacttagaagctattcgtaaagctgaagaagcacgtaa 
agctgaagaagcacgtaaagctgaagaagcacgtaaagctgaagagggac 
ataaaacccaagaagcacctatagttgaagaaggctacaaggttaataac 
gttcatcaaactgatactacagttaaagcgtctgatttaccaaagactaa 
gacagtttccgcagttcatatggctagaacagacaataaacagataactt 
cacatcagacacatgttgaaaaacaaattaaaaatacattgccatccact 
ggtgacagcaaacgtggttattatatcactggaatggctatcgttatgct 
gagtgtattatttagtttagctaaaaagtttaaaagcaaatat 

SEQ ID NO. 5102 
STRAIN A909 

TTGAATAATAAAGGTGTCGGTGGCGAT 

GGTGTCCAAATTTAT CAATACT AT AT C AAAATGGACAAC AAT AAACCTTA 
C T T AAGT C C C AAAG AT AAGACT ACT GT AG AGAAGT TAG AAG AT CG C T GG A 
AAAAAATTACTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGACGTT 
TATCTTCAATCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAGACCT 
TAT C AC AC CT C C AGGAT T T AAAAAAG AAGAT AAAAAAG T T G AAAAAC C AA 
AAT T AGAC C G T C C AC C AG GAAT T GAT T T AC C a C C AC C AACT T C AAT GAGA 
AG T T T T GAT TAT T C AAC C C C AC C G G G AACT AAG C C AAG C AAAC C C AAAG A 
TAGTTTATCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGATG 
AAG C AC T AAAG G AT AGT AAAAAAG AC G CT AT T G AAG AT AAAT C AGG AG C A 
AT TAAATATGCTAAGTCT CTT CAACTT AGCTTT GTT GATGACCCT ATTT T 
AG CT AG C AAAGT AAAT GG C AAAAT AT T AC AAGT C GAAT C T GAT GG C AAAT 
T AGT CAT T C C TAG AAAT GC T T T GT C AGC T AAT C AAT T T GAT G AC AC TAG T 
CT T AAAAT T T AT C G T AAT AAT AAT CG C AAT AAAG AAAT T AC TAT C AC AAC 
AG AT TAT T T T G C AGAT AC AAAAT AT GT C AAT AT C AC AG C G GT T G AC TAT T 
TGAGCAATACTACTTTTGAGCAATTAGCTACTGGTGAAACAGTAGATTAC 
CATGCCATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTAA 
GAT T T AT GT T AAC GAT AAAT T G C AAGAAAC T T C T C GT AT AG C G CT T AAAG 
AT AAAT C T G T T AAGAT T G G T AT T GAAT T AC C AAAT GAT G T C AG AC AT AT T 
GATAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATAT 
CTT G AAAAAT GAT GAAC AAG AC AT T AAT C T C AG C AAAAC T T AC C AAT T AA 
AATACAACCCGACAAATCGTCGTCTAGAGTTTACTATTAATAACATTAAC 
T C AAGT T C AGAAAT CAT G AC C AC T T T C AAAG AT G G AAAG AT GC C AG AAT T 
GG T T G Aa C AAAAAG AT GT T T C T T T GGAT AT Aa a C G AT AT GGAC AT G AGT A 
AGTTTAAAACTATTCGACTTGGACGAAAGGATTCTGAATTTAAGGGACAA 
C T TAT T G C AAAAAC T GG AAC AGT T GAAT T AGAT AT G T T T T T C AAAC AAT C 
T C AAGAC C C AG CTT C AAT TAT T AAAAAAAT AT AC C T TAT C C AAAAT GG T G 
TTCCAAATGAATTGAAAAAATTTG ACT CTAGTTTTGGTTT AAC TG AAAGT 
C AG AT AGAT GGAT AC TAT AT T TAT AAAG AT G C AAT T AAC C T T AAAT T T AA 
AT T AAC C AGT GGT G C AAG T C T T AAAG T T GT T TAT AAAG G G C AAG AAGAT C 
CAT AT AGT CAT C AG AAAG AAGAT AT G AC T AAAAAAG G T GAAC AG CT C AGT 
CAT T C AAC T C AAG C C AAT G AAAAT AC AG C AAAAGT AAC CT T T G C T AAT AT 
T G AC T GGT C AC AT T AT AG T AAGGT T AC T G T GAAT GGAAAAG AAG T T G GT A 
AAGGTAGTGAGTTACCTTTAACTAAAGGATGGACAACATTTGTATTACAT 
AAAAC AGAAAAT T CAT T AAAT G T T AAAAGT T T GAT TAT GG AG AC GG GT AG 
T GT AAGT AAG AAAG TTCAACAACTTCCTTT AAGT CCT AG ATT AT CTAAAA 
AT AAG CAT AT GAG GGAT AT G C T AC T T AC TAT G C AAAAAG AT T C AG C GT AT 
T AC G Aa a C AAGT G AC AG T C T AGT C C T T CG AAT T AAT CT C ACT G C AG AT AC 
TAAACTTAATTTTAATGCTGTTAAAGGAGCGAGTGCTCTTACTGAAAATA 
T GAT GAT G AGAC AGT T T G C AGT T G C T GGAC C AC AAG AT GAT C CT G T T AGT 

GAACATAAATACCCATCAGTATTTCTCTTAACTCCTGCCTTATTGGAAAC 
TGCTAGTGAGGCAACTCTaAATGGTAAGGAAATCACAGCATCTGGTATTA 
T CG GT C AC AT C AAGG AT GGT G AT AAAAG C AAG C AT GT T G AAGT C AAAAT G 
GT G AAT G AAAAT GG AG AC AT G CT AG GAAC C C C T G T TAT TAT T C AAG GT AA 
AG AC T T GACT AAT C GAAC AAAAC CAT T AAT GAG T GG AC G TAG AGT AC T T T 
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ATGCCGGTAAACAATATGAGTTCCGGGCTAAATTACCACTTAGTCGTTTT 
AAC AC T T G GAT TAG G GT T G AAGT GGT AACAGAAG C AG GAG AGAAAG C AAG 
TATTGTTCGTCGCATGTTCTTTGACCAATCAGtTCCAGAGCTTAACACAG 
CAGTTGCTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGTT 
G C C AAAG AT G AC T CT C T AAAAC T AAAATT AT AT C AAG AT GAT T CAT T AC T 
T GAAT C T GT T G AT AAAAC CGGT CT T T AT AGT T T T AGAAAT GG T GT AGAAA 
T CACTAAAGAT AT GACAGTACCACT AGAAT TT GGAGAT AAT ATT ATTAAG 
T TAT CTGCTGTT G AC T TAT C AAAT TAT CGT CG T AAT GAG AC C C T T CAT AT 
CT AT AG AAAC CGT T T T GAT GT T AAAG C AAGC C AAAT GAC AG C TG AC AAAG 
GAGCTAAAGTAACTGTGGATATGTTGATGAAGCACTTAGTTGTTCCAGAA 
AT GG C AG GAGC T TAT AC AT T AAC AAT C GAC G AAG AT C C AAAC AC AAAT GA 
AT C AGGAAT GT T AAC AAACGCT AAAGT AT CG ATT CAT TAT GT AAAT GGT G 
GT GT T GAT AAAGT T GAT GT T C CG AT T AAAGT AGT T GAC T T AG AAG CT AT T 
CGT AAAG C T G AAGAAG C AC AT AAAG C T G ACG AAG C AC G T AAAG CT G AAGA 
AG C AC GT AAAG CT GAAG AAG C AC G T AAAG CT G AAG AAG C AC GT AAAG CT G 
AAG AGGG AC AT a AAAC C C AAGAAG C AC C TAT AGT T G AAG AAGG CT AC AAG 
G T T AAT AAC GT T CAT C AAAC T GAT ACT AC AGT T AAAG C GT C T GAT T T AC C 
AAAGAC T AAG AC AGT T T C C G C AGT T CAT AT GGC TAGAACAGAC AAT AAAC 
AG AT AAC T T C AC AT C AG AC AC AT G T T G AAAAAC AAAT T AAAAAT A 

SEQ ID NO. 5103 
STRAIN H36B 

T G GT GT C C AAAT T TAT C AAT AC TAT AT C AAAAT G GAC AAC AAT AAACC T T 
ACTTAAGTCCCAAAGATAAGACTACTGTAGAGAAGTTAGaaGATCGCTGG 
AAAAAAAT T AC T T T C AAAGT T C AGG AT ACT GG CAT T G GT T T G AAAG AC G T 
TTATCTTCAATCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAGACC 
T TAT C AC AC CT C C AGGAT T T AAAAAAG AAG AT AAAAAAG T T G AAAAAC C A 
AAAT TAG AC C GT C C AC C AGG AAT T GAT T T AC C AG C ACC AAC T T C AAT GAG 
AAGT T T T GAT TAT T C AAC C C C AC CGGG AACT AAG C C AAG C AAAC C CAAAG 
ATAGTTTATCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGAT 
GAAG C AC T AAAG GAT AG T AAAAAAGACG CT AT T GAAG AT AAAT C AGGAGC 
AATTAAATATGCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTATTT 
TAG C TAG C AAAG T AAAT GG C AAAAT AT T AC AAGT C GAAT C T GAT GG C AAA 
T T AG T CAT T C C T AGAAAT G CT T T GT C AG C T AAT C AAT T T GAT GAC AC TAG 
T C T T AAAAT T TAT C GT AAT AAT AAT CG C AAT AAAGAAAT T a c TAT C AC AA 
C AG AT TAT T T T G C AG AT AC AAAAT AT GT C AAT AT C AC AG C G G T T GAC TAT 
TTGAGCAATACTACTTTTGAGCAATTAGCTACTGGTGAAaCAGTAGATTA 
CCATGCCATTGTAtTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTA 
AG AT T TAT G T C AAC GAT AAAT T GC AAGAAACT T CT C G TAT AG C GCT T AAA 
GAT AAAT C T GT T AAGAT T G GT AT T GAAT T AC C AAAT GAT G T C AG AC AT AT 
T GAT AGT T TAT CTGTTCGTCGTTT GAAT G AGG T T AAAAC T GT T GAT AAT A 
TCTTGAAAAATGATGAACAAGACATTAATCTCAGCAAAACTTACCAATTA 
AAAT AC AAC C C GAC AAAT CGT CGT C T AG AGTT T AC TAT T AAT AAC AT T AA 
C T C AAG T T C AG AAAT CAT GAC C ACT T T CAAAG AT G G AAAG AT G C C Ag AAT 
T G GT T G AAC AAAAAG AT GTTTCTTTG GAT AT AAAC GAT AT GGAC AT GAGT 
AAG T T T AAAAC TAT T C G AC T T GGAC G AAAGG AT T C T GAAT T T AAG G GAC A 
ACT TAT T G C AAAAACT GG AAC AGT T GAAT T AG AT AT GT TT T T C AAAC AAT 
C T C AAGAC C C AG C T T C AAT TAT T AAAAAAAT AT AC C T TAT C C AAAAT GGT 
GTTCCAAATGAATTGAAAAAATTTG ACT CT AGTT TTGGTTTAACTG AAAG 
TCAGATAGATGGATACTATATTTATAAAGATGCAATTAACCTTAAATTTA 
AAT T AAC C AG T G G T G C AAG T C T T AAAGT T GT T TAT AAAGGG C AAG AAGAT 
C CAT AT AG t CAT C AG AAAGAAGAT AT G ACT AAAAAAG GT G AAC AG CT C AG 
T CAT T C AAC T C AAG C C AAT G AAAAT AC AGCAAAAGT AAC C T T T G C T AAT A 
TTGACTGGTCACATTATAGTAAGGTTACTGTGAATGGAAAAGAAGTTGGT 
AAAG GT AG T GAG T T AC C T T T AACT AAAGG AT G GAC AAC AT T T G TAT T AC A 
T AAAAC AGAAAAT T CAT T AAAT GT T AAAAG T T T GAT TAT G G AGACGG G T A 
G T G T AAGT AAGAAAG T T C AAC AAC T T C CT T T AAGT C C TAG AT TAT C T AAA 
AAT AAG CAT AT G AGGGAT AT G C T AC T T AC TAT G C AAAAAGAT TCAGCGTA 
T T AC G AAAC AAG T GAC AGT C T AGT C C TT C GAAT T AAT C T C AC T G C AG AT A 
C TAAACTTAATTTTAATGCTGTT AAAGG AG CGAGTGCTCTT ACT G AAAAT 
AT GAT GAT G AGAC AG T T T G C AGT T GCT GGAC C AC AAG AT GAT CCT GT T AG 
T GAAC AT AAAT AC C CAT C AGT AT T T CT C T T AAC TCCTGCCT TAT T G G AAA 
C T G CT AG T GAG G C a AC T C T AAAT GGT AAGGAAAT C AC AG CAT CT GGT AT T 
AT CGGT C AC AT C AAGG AT GG t GAT AAAAG C AAGC AT GT T GAAGT C AAAAT 
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GGTGAATGAAAATGGAGACATGCTAGGAACCCCTGTTATTATTCAAGGTA 
AAGACT T GACT AAT C G AAC AAAAC C AT T AAT G AGT GGACGT AG AGT AC T T 
TATGCCGGTAAACAATATGAGTTCCGGGCTAAATTACCACTTAGTCGTTT 
T AAC a CT T G GAT TAG G GT T GAAGT GGT AAC AG AAGC AGG AG AG AAAGC AA 
GT AT TGTTCGTCG C AT GT T CT T T GAC C AAT C AGT T C C AGAG C T T AAC AC A 
GCAGTTGCTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGT 
T GCC AAAGAT GAC T CT CT AAAACT AAAAT TAT AT C AAG AT G ATT CAT TAG 
T T G AAT CT G T T GAT AAAAC C GGT CT T TAT AGT T T T AGAAAT G GT GT AG AA 
AT C ACT AAAGAT AT GAC AG T AC C AC TAG AAT T TGG AG AT AAT ATT ACT AA 
GTTATCTGCTGTTGACTTATCAAATTATCGTCGTAATGAGACCCTTCATA 
T CT AT AG AAAC C GT T T T G AT GT T AAAGC AAG C C AAAT GAC AG C T GACAAA 
G G AGCT AAAGT AAC T GT GG AT AT GT T GAT G AAG C AC T T AGT T GT T C CAG A 
AAT GG C AGGAG C T TAT AC AT T AAC AAT C GAC GAAG C T C C AAAC AC AAAT G 
AAT C AGG AAT G T T AAC AAACG C T AAAGT AT CG AT T CAT T AT GT AAAT GG T 
GGT GT TGAT AAAG 1 1 GAT GT T C CGAT T AAAGT AGT T GACT T AGAAG C T AT 
TCGTAAAGCTGAAGAAGCACATAAAGCTGACGAAGCACGTAAAGCTGAAG 
AAGC ACGT AAAG CT GAC GAAG C AC AT AAAG C T G AAG AAGT AC GT AAAGCT 
GAAG AAG C AC AT AAAGT CGAAG AAG C AC GT AAAG CT GAAG AG G GAC AT AA 
AAC C C AAGAAG C AC CT AT AGT T GAAGAAGGC T AC AAGGT T AAT AAC GT T C 
AT C AAAC T GAT ACT AC AGT T AAAGC GT C T GAT T TAG C AAAG AC T AAG AC A 
GT T T C C G C AG T T CAT AT GGCT AGAAC AG AC AAT AAAC AGAT AAC T T C AC A 
T CAG AC AC AT G 

SEQ ID NO. 5104 
STRAIN 18RS21 

TTGAATAATAAAGGTGTCGGTGGCGATGGTGTCCAA 

AT T T AT C AAT AC TAT AT C AAAAT G GAC AAC AAT AAAC CT T AC T T AAGT C C 
C AAAG AT AAGAC T AC T G TAG AG AAGT TAG AAG AT C G CT G G AAAAAAAT T A 
CTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGACGTTTATCTTCAA 
TCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAGACCTTATCACACC 
T C CAG GAT T T AAAAAAGAAG AT AAAAAAGT T G AAAAAC C AAAAT TAGAC C 
GT C C AC C AGG AAT T GAT T T AC CAG C AC C AAC T T C AAT GAG AAGT T T T GAT 
TATTCAACCCCACCGGGAACTAAGCCAAGCAAACCCAAAGATAGTTTATC 
AACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGaTGAAGCACCAA 
AG GAT AGT AAAAAAG AC G C T AT T GAAG AT AAAT C AGGAG C AAT T AAAT AT 
GCT AAGT CTCTT CAACTTAG CT TTGTTGAT GAC CCT ATT TTAGCTAGCAA 
AGTAAATGGCAAAATATTACAAGTCGAATCTGATGGCAAATTAGTCATTC 
CT AG AAAT G C T T T GT CAG C T AAT C AAT T T GAT GAC AC T AGT C T T AAAAT T 
TAT C G T AAT AAT AAT CG C AAT AAAGAAAT T ACT AT C AC AAC AG AT TAT T T 
T G C AGAT AC AAAAT AT GT C AAT AT C AC AGC GGT T GAC TAT T T GAG C AAT A 
C T AC T T T T GAG C AATT AG C TACT GGT G AAAC AGT AG AT T AC CAT G C C ATT 
GT AT T T T C AAG CT T T GC T GCT AT T AAAG AC AAG GGT GGT AAG AT T T AT GT 
T AAC GAT AAATTGCAAGAaACTTCTCGT AT AGCGCTT AAAGAT AAAT CTG 
TTAAGATTGGTATTGAATTACCAAATGATGTCAGACATATTGATAGTTTA 
TCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATATCTTGAAAAA 
T GAT G AAC AAG AC AT T AAT C T CAG C AAa ACT T AC C AAT T AAAAT AC AAC C 
CG AC AAAT CG T C GT CT AG AG T T T AC TAT T AAT AAC AT T AAC T C AAGT T C A 
G AAAT CAT GAC C ACT T T C AAAG AT GGAAAGAT G C C AGAAT T G G T T G AAC A 
AAAAG AT GT T T C T T T GGAT AT a AAC GAT AT GG AC AT G AGT AAGT T TAAAA 
CTATTCGACTTGGACGAAAGGATTCTGAATTTAAGGGACAACTTATTGCA 
AAAAC T GG AAC AGT T G AAT T AG AT AT GT T T T T C AAAC AAT C T C AAG AC C C 
AG CT T C AAT TAT T AAAAAAAT AT AC CT TAT C C AAAAT GG T G T T C C AAAT G 
AATTGAAAAAATTTGACTCTAGTTTTGGTTTAACTGAAAGTCAGATAGAT 
GGAT AC TAT AT T TAT AAAG AT G C AAT T AAC C T T AAAT T T AAAT T AAC CAG 
T G GT G C AAG T C T T AAAG T T GT T T AT AAAGGG C AAG AAG AT C CAT AT AGT C 
AT CAG AAAG AAG AT AT G ACT AAAAAAG GTG AAC AGCT C AGT CAT T CAACT 
C AAG C C AAT G AAAAT AC AG C AAAAGT AAC C T T T G C T AAT AT T GACT G GT C 
ACATTATAGTAAGGTTACTGTGAATGGAAAAGAAGTTGTTAAAGGTAGTG 
AGT T AC C T T T AAC T AAAG GAT G GAC AAC AT T T G TAT T AC AT AAAAC AG AA 
AAT T C AT T AAAT GT T AAAAGT T T GAT TAT GG AG AC GGGT AG T GT AAG T AA 
G AAAGT T C AAC AAC T T C CT T T AAGT C CT AGAT TAT C T AAAAAT AAG CAT A 
T G AGG GAT AT G CT AC T T AC TAT G C AAAAAG AT T CAG C GT AT T ACG AAAC A 
AGT GAC AGT C T AG T C C T T C G AAT T AAT CT C AC T G CAG AT AC T AAAC T T AA 
T T T T AAT GCT GT T AAAG GAG C G AGT G C T C T T AC T G AAAAT AT GAT GAT G A 
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GAC AG T T T GC AG T T G C T G GAC CAC AAG AT GAT C CT G T T AGT GAAC AT AAA 
TACCCATCAGTATTTCTCTTAACTCCTGCCTTATTGGAAACTGCTAGTGA 
GG C AAC T C T AAAT GGT AAGG AAAT C AC AGC AT C T G GT AT T AT CG G T C AC A 
T C AAG GAT GGT GAT AAAAG C AAG C AT GTT G AAGT C AAAAT GGT G AAT GAA 
AAT GGAGAC AT G CT AGG AAC C C CTG T TAT TAT T C AAGGT AAAGAC T T GAC 
T AAT C G AAC AAAAC CAT T AAT GAGT GG AC GT AG AG T AC T T T AT GC CG GT A 
AAC AAT AT GAG T T C C GG GC T AAAT T AC C AC T T AGT CGT T T T AAC AC T T G G 
ATTAGGGTTGAAGTGGTAACAGAAGCAGGAGAGAAAGCAAGTATTGTTCG 
T CG C AT GT T C T T T GAC C AAT C AGT T C C AGAG C T T AAC AC AG C AGT T G C T A 
AACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGTTGCCAAAGAT 
GACT CT C T AAAAC T AAAAT TAT AT C AAG AT GAT TC ATT AC tTGAATCTGT 
T GAT AAAAC C G G T C T T TAT AGT T T TAG AAAT GGT GT AG AAAT C AC T AAAG 
AT AT GAC AGT AC C ACT AG AAT T T G G AGAT AAT AT TAT T AAGT TAT C T G C T 
GT T GAC T T AT C AAAT TAT C GT C G T AAT GAG AC C C T T C AT AT CT AT AGAAA 
C C GT T T T GAT GT T AAAG C AAG C C AAAT GAC AG C T GAC AAAGGAGCT AAAG 
T AAC T GT G G a TAT GT T GATG AA(S C ACT T AGT T GT T C C AG AAAT G G C AG G A 
G CT T AT AC AT T AAC AAT C G ACG AAGC T C C AAAC AC AAAT GAAT C AGG AAT 
GT T AAC AAAC GC T AAAGT AT C GAT T CAT T AT GT AAAT GGTGGTGTT GAT A 
AAGT T G AT GT T C C GAT T AAAG TAG T T GAC T TAG AAG C TAT T C GT AAAG C T 
GAAGAAGCACGTAAAGCTGAAGAAGCACGTAAAGCTGAAGAGGGACATAA 
AACCCAAGAAGCACCTATAGTTGAAGAAGGCTACAAGGTTAATAACGTTC 
AT C AAACT G AT AC T AC AGT T AAAG C GT C T GAT T T AC C AAAG ACT AAG AC A 
GT T T C C G CAG T T CAT AT GG C TAG AAC AG AC AAT AAAC AGAT AAC T T CAC A 
TCAGACACATGTTGAA 

SEQ ID NO. 5105 
STRAIN M732 

TTGAATAATAAAGGTGTCGGTGGCGATGGTGTCC 

AAAT T TAT C AAT AC TAT AT C AAAAT GG AC AAC AAT AAAC C T T ACT T AAGT 
C C C AAAGAT AAG AC TAG T G T AG AGAAG T TAG AAG AT CG CT G GAAAAAAAT 
TACTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGACGTTTATCTTC 
AATCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAGACCTTATCACA 
C C T C C AGG AT T T AAAAAAGAAGAT AAAAAAGT T G AAAAAC C AAAAT TAG A 
CCGTCCac C AGG AAT T GAT T T AC CAG CAC C AAC T T C AAT G AGAAGT T T T G 
AT TAT T C AAC C C CAC C GGG AAC T AAGC C AAG C AAAC C C AAAG AT AGT T T A 
TCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGATGAAGCCAC 
CAAAGG AT AGT AAAAAAGACG CT AT T GAAG AT AAAT CAG GAG C AAT T AAA 
TATGCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTATTTTAGCTAG 
C AAAGT AAAT GG C AAAAT AT T AC AAGT CG AAT C T GAT G G C AAAT T AGT C A 
TTCCTAGAAATGCTTTGTCAGCTAATCAATTTGATGACACTAGTCTTAAA 
ATT TAT CGT AAT AAT AAT CGC AAT AAAG AAAT TACT AT CAC AAC AG ATT A 
T T T T GC AG AT AC AAAAT AT GT C AAT AT CAC AG C GG T T GACT AT T T GAG C A 
AT ACT AC T T T T GAG C AAT TAG C T ACT G GT G AAAC AGT AGAT T AC CAT GC C 
ATTGT ATTT T CAAGCTTTGCT GCT ATTAAAGACAAGGGT GGTAAGAT TT A 
T GT T AAC GAT AAAT T G C AAG AAACT T C T C G TAT AG C GC T T AAAG AT AAAT 
CTG T T AAG AT T G GT AT T GAAT T AC C AAAT G AT GT CAG AC AT AT T GAT AGT 
TTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATATCTTGAA 
AAAT GAT GAAC AAG AC AT T AAT C T CAG C AAAACT T AC C AAT T AAAAT AC A 
AC C C GAC AAAT C GT C GT C TAG AG T T TAG TAT T AAT AAC AT T AAC T C AAGT 
T CAG AAAT CAT GAC CAC T T T C AAAGAT GG AAAGAT GC C AGAAT T G G T T G A 
AC AAAAAGAT GT T T C T T T GG AT AT AAAC GAT AT G GAC AT GAGT AAG T T T A 
AAACTATTCGACTTGGACGAAAGGATTCTGAATTTAAGGGACAACTTATT 
G C AAAAAC T G GAAC AGT T GAAT TAG AT AT GT T T T T C AAAC AAT CT C AAG A 
C C CAG C T T C AAT TAT T AAAAAAAT AT AC C T TAT C C AAAAT GG t GT T C C AA 
AT GAAT T G AAAAAAT T T GAC T CT AG T T TT GGT T T AAC T GAAAG T CAG AT A 
GAT G GAT ACT AT AT T TAT AAAG ATG C AAT T AAC CT T AA aT T T AAAT T AAC 
CAGTGGTGCAAGTCTTAAAGTTGTTTATAAAGGGCAAGAAGATCCATATA 
GT CAT CAG AAAG AAG AT AT GAC T AAAAAAG GT G AAC AG CT CAG T CAT T C A 
ACT C AAG C C AAT G AAAAT AC AG C AAAAG T AAC CT T TG C T AAT AT T G ACT G 
GTCACATTATAGTAAGGTTACTGTGAATGGAAAAGAAGTTGGTAAAGGTA 
GTGAGTTACCTTTAACTAAAGGATGGACAACATTTGTATTACATAAAACA 
GAAAATT C AT T AAATGT TAAAAGTTT GATT ATGGAG ACGGGT AGT GT AAG 
T AAG AAAGT T C AAC AACT T c CT T T AAG T C C TAG AT TAT CT AAAAAT AAG C 
AT AT GAG GG AT AT G C T ACT TACT AT G C AAAAAGAT T CAG C GT AT T AC GAA 
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ACAAGTGACAGTCTAGTCCTTCGAATTAATCTCACTGCAGATACTAAACT 
T AAT T T T AAT G C T GT T AAAG GAG C G AGT G CT CT T AC T G AAAAT AT GAT G A 
T G AGAC AGT T T GC AGT T G C T G GAG C AC AAGAT GAT C CT G T T a GT G AAC AT 
AAATACCCATCAGTaTTTCTCTTAACTCCTGCCTTATTGGAAaCTGCTAG 
T GAGG CAACT CT AAAT GGT AAG G AAAT CAC AG CAT C T GG T AT TAT C G GT C 
AC AT C AAGGAT G GT G AT AAAAG C AAGC AT GT T G AAG T C AAAAT GGT G AAT 
G AAAAT GGAG AC AT G C T AGG AAC C C C T GT T AT TAT T C AAGGT AAAGACT T 
GACT AAT CGAAC AAAAC CAT T AAT GAG T GGAC G TAG AGT AC T T TAT G CCG 
GTAAACAATATGAGTTCCGGGCTAAATTACCACTTAGtCGTTTTAACACT 
TGGATTAGGGTTGAAGTGGTAACAGAAGCAGGAGAGAAAGCAAGTATTGT 
T C G T C G C AT GT T C T T T G AC C AAT C AGT T C C AG AGCT T AAC AC AG CAGT T G 
CTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGTTGCCAAA 
GAT G AC T C T C T AAAAC T AAAAT TAT AT C AAG AT GAT T CAT T AC T T G AAT C 
TGTTGATAAAACCGGTCTTTATAGTTTTAGAAATGGTGTAGAAATCACTA 
AAGAT AT GAC AGT AC CAC T AGAATT T G GAG AT AAT AT TAT T AAGT T AT CT 
G C T GT T G AC T T AT C AAAT TAT C GT C GT AAT G AGAC C C T T CAT AT C TAT AG 
AAAC CGT T T T GAT GT T AAAG C AAGC C AAAT GAC AG C T GAC AAAGGAG C T A 
AAGT AACTGTGGATATGTT GAT GAAGCACTT AGT TGTTCCAGAAATGGCA 
GGAG C T TAT AC AT T AAC AAT C G ACGAAG C T C C AAAC AC AAAT G AAT C AG G 
AATGTTAACAAACGCTAAAGT ATCGAT T CAT TATGT AAAT GGTGGTGT T G 
ATAAAGTTGATGTTCCGATTAAAGTAGTTGACTTAGAAGCTATTCGTAAA 
GCTGAAGAAGCACATAAAGCTGACGAAGCACGTAAAGCTGAAGAAGCACG 
TAAAGCTGAAGAAGCACATAAAGCTGAAGAAGTACGTAAAGCTGAAGAAG 
C AC AT AAAGT C GAAG AAGC AC GT AAAG C T G AAGAG GGAC AT AAAAC C C AA 
GAAGCACCTATAGTTGAAGAAGGCTACAAAGTTAATAACGTTCATCAAAC 
T GAT ACT AC AGT T AAAG CG T C T GAT T T AC C AAAG AC T AAG AC AGT T T C C G 
CAGTT CATAT GGCTAGAAC AGACAATAAAC AGAT AACTT C ACAT C AG AC A 
CAT GT T GAAAA 

SEQ ID NO. 5106 
STRAIN COH1 

TTGAATAATAAAGGTGTCGGTGGCGATGGT 

GTCCAAATTTATCAATACTATATCAAAATGGACAACAATAAACCTTACTT 
AAG T C C C AAAG AT AAG ACT AC T GT AG AG AAGT TAG AAGAT CG C T GG AAAA 
AAAT T ACTT T C AAAG T T C AG GAT ACT GG CAT T G GT T T G AAAGAC GT T TAT 
C T T C AAT CT GT T AAGT AT GTTGGTGG T GGC AAT AAT AAT T TAG AC C T TAT 
CAC AC C T C C AG GAT T T AAAAAAG AAG AT AAAAAAG TT G AAAAAC C AAAAT 
TAG AC C GT C C ACC AG G AAT T GAT T T AC C AG CAC CAACT T C AAT GAG AAG T 
T T T GAT TAT T C AAC C C CAC C GG GAAC T AAG C C AAG C AAAC C C AAAG AT AG 
TTTATCAACTCCTCCAGGtTTCCCAGATTTAAACACGCCGCCGGATGAAG 
C C a C CAAAGGAT AGT AAAAAAG AC G C T AT T GAAGAT AAAT C AG GAG C AAT 
T AAAT ATGCT AAGT CTCTTC AACTT AG CTTTGTTGATGACCCTATTTT AG 
C T AG C AAAG T AAAT GG C AAAAT AT T AC AAGT C GAAT CT G AT GG C AAAT T A 
GT CAT T C C TAG AAAT GC T T T GT C AG C T AAT C AAT T T GAT GAC AC T AGT C T 
T AAAAT T TAT C GT AAT AAT AAT C GC AAT AAAG AAAT T AC TAT CAC AAC AG 
AT TAT T T T G C AG AT AC AAAAT AT G T C AAT AT C AC AGCGGT T GACT AT T T G 
AG C AAT ACT AC T T T T GAG C AAT TAG C TACT GGT G AAAC AGT AG AT T AC C A 
TGCCATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTAAGA 
TTTATGTTAACGATAAATTGCAAGAAACTTCTCGTATAGCGCTTAAAGAT 
AAAT C T GT T AAGAT T GGT AT T G AAT T AC C AAAT GAT G T C AGAC AT AT T G A 
TAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATATCT 
T GAAAAAT GAT G AAC AAGAC AT T AAT C T C AG C AAAAC T TAG C AAT T AAAA 
T AC AAC C CG AC AAAT C G T CGT C TAG AG T T TAG TAT T AAT AAC AT T AAC T C 
AAGT T C AGAAAT CAT GAC CAC T T T C AAAG AT GG AAAG AT G C C AG AAT T GG 
TTGAACAAAAAGATGTTTCTTTGGATATAAACGATATGGACATGAGTAAG 
T T T AAAAC TAT T CG AC T T G GAC G AAAGG AT T CT G AAT T T AAG GGAC AAC T 
TAT T GC AAAAACT G GAAC AGT T GAAT TAG AT AT G T T T T T C AAAC AAT C T C 
AAGACCCAGCTTCAATTATTAAAAAAATATACCTTATCCAAAATGGTGTT 
CCAAATGAATTGAAAAAATTTGACTCTAGTTTTGGTTTAACTGAAAGTCA 
GAT AG AT G GAT AC TAT AT T T AT AAAG AT G C AAT T AAC CT T AAAT T T AAAT 
TAACCAGTGGTGCAAGTCTTAAAGTTGTTTATAAAGGGCAAGAAGATCCA 
TAT AGT CAT C AG AAAG AAG AT AT GACT AAAAAAG GT GAAC AG C T CAGT C A 
T T C AAC T C AAG C C AAT G AAAAT AC AG C AAAAGT AAC C T T T G C T AAT AT T G 
AC T G GT CAC AT TAT AGT AAGG T TACT GT G AAT GG AAAAG AAGT T G GT AAA 
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GGTAGTGAGTTACCTTTAACTAAAGGATGGACAACATTTGTATTACATAA 
AAC AG AAAAT T CAT T AAAT GT T AAAAG T T T G AT T AT GG AGAC G GGT AGT G 
T AAGT AAG AAAG T T C AAC AAC T T C C T T T AAGT C CT Ag AT TAT CT AAAAAT 
AAGC AT AT G AGGG AT AT GC T AC T TACT AT G C AAAAAGAT T C AGC G TAT T A 
CGAAACAAGTGACAGTCTAGTCCTTCGAATTAATCTCACTGCAGATACTA 
AAC T T AAT T T T AAT G C T GT T AAAGG AG CG AGTGC T CT T AC T GAAAAT AT G 
AT GAT GAGAC AGT T T G C AGT T G C T G G AC C AC AAGATG AT C C T GT T AGT GA 
ACATAAATACCCATCAGTATTTCTCTTAACTCCTGCCTTATTGGAAACTG 
CT AGT GAG G C AAC T C T AAAT GGT AAGGAAAT C AC AGC AT C T G GT AT TAT C 
GGT CACAT CAAGGAT GGTG AT AAAAGC AAGC ATGT TGAAGTC AAAAT GGT 
GAATGAAAATGGAGACATGCTAGGAACCCCTGTTATTATTCAAGGTAAAG 
ACTTGACTAATCGAACAAAACCATTAATGAGTGGACGTAGAGTACTTTAT 
G C C G GT AAAC AAT AT G AGT T C C GGG CT AAAT T AC C AC T T AGT C GT T T T AA 
C ACT T G GATT AGGGT T GAAG T GG T AAC AGAAG C AGG AG AG AAAG C AAG T A 
TTGTTCGTCGCATGTTCTTTGACCAATCAGTTCCAGAGCTTAACACAGCA 
GTTGCTAAACGTGATTtGACTTCTGATACTGCTCTTATCCACATCGTTGC 
C AAAG AT GAC T C T C T AAAa C T AAAAT TAT AT C AAG AT GAT T CAT T AC T T G 
AAT CT GT T G AT AAAAC C G GT CT T TAT AGT T T T AGAAAT GGT G TAG AAAT C 
ACT AAAG AT AT G AC AGT AC C ACT AG AAT T T GGAGATAATAT TAT T AAGT T 
ATCTGCTGTTGACTTATCAAATTATCGTCGTAATGAGACCCTTCATATCT 
AT AG AAAC C GT T T T GAT GT T AAAG C AAGCC AAAT GAC AG CT G AC AAAGGA 
G C T AAAGT AAC T GT GG AT AT GT T GAT G AAGC ACT T AGT T GT T C C AGAAAT 
GGCAGGAGCT TAT ACATT AACAAT CGACGAAGCT C CAAACACAAAT GAAT 
CAGGAATGTTAACAAACGCTAAAGTATCGATT CAT T ATGT AAAT GGT GGT 
G T T GAT AAAGT T G AT GT T C C GAT T AAAGT AGT T G ACT T AGAAG C T AT T C G 
T AAAGC T GAAG AAG CACAT AAAG C T GAC GAAG C AC GT AAAG C T G AAGAAG 
C AC GT AAAG C T GAAGAAG CACAT AAAG CT GAAGAAGT AC GTAAAGC T G AA 
GAAG CACAT AAAGT C G AAGAAG C AC GT AAAGC T GAAG AG G GAC AT AAAAC 
C C AAG AAG C AC C TAT AGT T GAAGAAG G CT AC AAAGT T AAT AACGT T CAT C 
AAACT GAT ACT AC AGT T AAAGC GT C T GAT T T AC C AAAG AC T AAG AC AG T T 
T CCGCAGTT C AT ATGG CT AGAACAG ACAAT AAACAGATAACTT CAC AT CA 
GACACATGT 

SEQ ID NO. 5107 
STRAIN M781 

TTGAATAATAAAGGTGTCGGTGGCGATGGT 

GT C CAAAT T TAT C AAT AC TAT AT C AAAAT GG AC AAC AAT AAAC C T T ACT T 
AAGT C C C AAAG AT AAGAC T AC T GT AG AG AAG T T AGAAG AT C G C T GG AAAA 
AAATTACTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGACGTTTAT 
CTTCAATCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAGACCTTAT 
CAC AC C T C C AGG AT T T AAAAAAGAAG AT AAAAAAGT T G AAAAAC C AAAAT 
TAG AC C GT C CAC C AGG AAT T GAT T T AC CAG C AC C AAC TT C AAT GAG AAG T 
T T T GAT TAT T C AAC C C CAC CG G G AAC T AAGC C AAG C AAAC C C AAAG AT AG 
TTTATCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCGGATGAAG 
C C a C C AAAGG AT AGT AAAAAAG AC GCT AT T GAAG AT AAAT C AGGAG C AAT 
TAAATATGCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTATTTTAG 
CT AG C AAAGT AAAT G G C AAAAT AT T AC AAGT C GAAT C T GAT G G CAAAT T A 
GT CAT T C C TAG AAAT G C T T T GT C AGCT AAT C AAT T T GAT GAC AC T AGT C T 
T AAa AT T TAT C G T AAT AAT AAT CG C AAT AAAG AAAT T a C T AT CAC AAC AG 
AT TAT T T T G CAG AT AC AAAAT AT GT C AAT AT CAC AG C GGT T GAC TAT T T G 
AG C AAT AC TACT T T T GAG C AAT T AGC TACT G GT G AAAC AG TAG AT T AC C A 
TGCCATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTGGTAAGA 
T T T AT G T T AAC GAT AAAT T G C AAG AAAC T T C T CG T AT AG C G C T T AAAGAT 
AAATCTGTTAAGATTGGTATTGAATTACCAAATGATGTCAGACATATTGA 
TAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATAATATCT 
T G AAAAAT GAT G AAC AAG AC AT T AAT C T CAG C AAAACT T AC C AAT T AAAA 
T AC AAC C C GAC AAAT C GT C GT C TAG AG T T TACT AT T AAT AAC AT T AAC T C 
AAG T T C AGAAAT CAT GAC CAC T T T C AAAG AT GG AAAGAT G C CAG AAT T GG 
T T GAAC AAAAAGAT GTTTCTTTG GAT AT AAAC GAT AT GG AC AT G AGT AAG 
TTT AAAACT ATT CGACTTGGACG AAAGG AT TCT GAAT TTAAGGGACAACT 
T ATTGCAAAAACTGGAAC AGT TGAATTAGATATGTTTTTC AAAC AAT CTC 
AAGAC C CAGCTTC AAT T ATT AAAAAAAT AT ACCTT AT CC AAAAT GGT GTT 
C CAAAT GAATTGAAAAAAT TTGACT CT AGT T TT GGT TTAACTGAAAGT CA 
GAT AG AT GG AT AC TAT AT T TAT AAAG AT GC AAT T AAC C T T AAAT T T AAAT 
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T AAC C AGT GGT G C AAG T C T T AAAGT T GT T T AT AAAGGGC AAG AAG AT CCA 
T AT AGT CAT C AG AAAG AAG AT AT G AC T AAAAAAG GT G AAC AG C T C AGT C A 
T T C AAC T C AAG C C AAT GAAAAT AC AG C AAAAGT AAC CT T T G C T AAT AT T G 
ACTGGTCACATTATAGTAAGGTTACTGTGAATGGAAAAGAAGTTGGTAAA 
GGT AGT GAG T T AC C T T T AAC T AAAG G AT GGAC AAC AT T T GT AT T AC AT AA 
AAC AGAAAAT T CAT T AAAT GT T AAAAG T T T GAT T AT GGAGACG G GT AG T G 
T AAG T AAG AAAGT T C AAC AACT T C C T T T AAGT C CT AG AT TAT CT AAAAAT 
AAG CAT AT G AGGG AT AT G C TACT T AC TAT G C AAAAAGAT T C AGC GT AT T A 
CGAAACAAGTGACAGTCTAGTCCTTCGAATTAATCTCACTGCAGATACTA 
AACTTAATTTTAATGCTGTTAAAGGAGCGAGTGCTCTTACTGAAAATATG 
AT GAT GAGAC AGT T T GC AGT T G C T GGAC C AC AAG AT GAT C CT G T T AGT G A 
ACATAAATACCCATCAGTATTTCTCTTAACTCCTGCCTTATTGGAAACTG 
C T AGT GAG G C AAC T C T AAAT GG T AAGGAAAT C AC AG CAT C T G GT AT T AT C 
GGT CAC AT C AAG GAT G GT GAT AAAAG C AAG C ATGT T G AAG T C AAAAT GGT 
G AAT GAAAAT GG AG AC AT G C T AG G AAC C C CT G TT AT TAT T C AAGGT AAAG 
AC T T GAC T AAT CG AAC AAAAC CAT T AAT GAG T GG AC G T AGAG T AC T T T AT 
G C C GGT AAAC AAT AT G AGT T C CGGG C T AAAT T AC CAC T T AGT C G T T T T AA 
CACTTGGATTAGGGTTGAAGTGGTAACAGAAGCAGGAGAGAAAGCAAGTA 
TTGTTCGT CG C AT G T T C T T T G AC C AAT C AG T T C C AG AG CT T AAC AC AG C A 
GTTGCTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACATCGTTGC 
C AAAG AT GAC T CT C T AAAAC T AAAAT TAT AT C AAGAT GAT T C AT T AC T T G 
AATCTGTTGATAAAACCGGTCTTTATAGTTTTAGAAATGGTGTAGAAATC 
AC T AAAGAT AT GAC AGT AC CAC TAG AAT T T G GAG AT AAT AT TAT T AAGT T 
ATCTGCTGTTGACTTATCAAATTATCGTCGTAATGAGACCCTTCATATCT 
AT AG AAAC C G T T T T GAT G T T AAAG C AAG C C AAAT GAC AG C T GAC AAAGGA 
G C T AAAGT AACT GT G GAT AT GT T GAT G AAG CAC T T AGT T GT T C C AG AAAT 
GG CAGG AG C T TAT AC AT T AAC AAT C GAC G AAG C T C C AAAC AC AAAT GAAT 
CAGGAATGTTAACAAACGCTAAAGTATCGATTCATTATGTAAATGGTGGT 
GT T GAT AAAG T T GAT GT T C C G AT T AAAGT AGT T G AC T TAG AAG C TAT T CG 
T AAAG C T GAAG AAG CAC AT AAAG C T GAC GAAG C AC GT AAAG C T G AAGAAG 
C AC GT AAAG C T G AAG AAG CAC AT AAAG C T G AAGAAGT AC GT AAAG C T G AA 
GAAG CAC AT AAAGT C G AAGAAG CAC CG T AAAG C T GAAG AG GG AC AT AAAA 
C C C AAG AAG CAC C TAT AGT T GAAGAAG G C T AC AAAGT T AAT AAC G T T CAT 
C AAACT GAT AC T AC AGT T AAAG C G T CT G AT TT AC C AAAGAC T AAG AC AG T 
T T C C GC AG T T CAT AT GG CT AG AAC AGAC AAT AAAC AG AT AACT T CAC AT C 
AG AC AC AT GT T G 

SEQ ID NO. 5109 
STRAIN JM9130013 

T G GT GT C C AAAT T TAT C AAT AC TAT AT C AAAAT GGAC AAC AAT AAAC 
CT T AC T T AAG T C C C AAAG AT AAG AC TACT GT AG AGAAGT T AGAAG AT C GC 
TGGAAAAAAATTACTTTCAAAGTTCAGGATACTGGCATTGGTTTGAAAGA 
CGTTTATCTTCAATCTGTTAAGTATGTTGGTGGTGGCAATAATAATTTAG 
AC C T T AT CAC AC C T C C AGGAT T T AAAAAAGAAG AT AAAAAAGT T G AAAAA 
C C AAAAT T AGAC C GT C CAC C AG GAAT T GAT T T AC C AG C AC C AAC T T C AAT 
GAG AAGT T T T GAT TAT T C AAC C C C AC CGG G AAC T AAG C C AAG C AAAC CCA 
AAGAT AGTTTATCAACTCCTCCAGGTTTCCCAGATTTAAACACGCCGCCG 
GAT GAAG CAC C AAAG GAT AGT AAAAAAG AC G CT AT T GAAG AT AAAT CAGG 
AGCAATTAAATATGCTAAGTCTCTTCAACTTAGCTTTGTTGATGACCCTA 
T T T TAG C T AGC AAAGT AAAT G G C AAAAT AT T AC AAG T C GAAT C T GAT GG C 
AAAT T AGT CAT T C C TAG AAAT G CT T T GT C AG C T AAT C AAT T T GAT GAC AC 
TAG T C T T AAAAT T TAT C GT AAT AAT AAT CG C AAT AAAG AAAT TACT AT C A 
C AAC AG AT TAT T T T G C AG AT AC AAAAT AT GT C AAT AT CAC AG C G GT T GAC 
TAT T T G AGC Aa T AC T AC T T T T GAG C AAT TAG C T AC T GGT G AAAC AGT AG A 
TTACCATGCCATTGTATTTTCAAGCTTTGCTGCTATTAAAGACAAGGGTG 
GT AAG AT T T AT GT T AAC GAT AAAT T G C AAG AAAC T T C T C GT AT AG C GC T T 
AAAG AT AAAT C T G T T AAG AT T GGT AT T G AAT T AC C AAAT GAT G T C AG AC A 
TATTGATAGTTTATCTGTTCGTCGTTTGAATGAGGTTAAAACTGTTGATA 
AT AT C T T G AAAAAT GAT G AAC AAG AC AT T AAT CT C AG C AAAAC T T AC C AA 
TTAAAATACAACCCGACAAATCGTCGTCTAGAGTTTACTATTAATAACAT 
T AAC T C AAG T T C AG AAAT CAT GAC C ACT T T C AAAG AT G G AAAGAT G C C AG 
AAT T G G T T G AAC AAAAAGAT GTTTCTTTG G AT AT AAACG AT AT G GAC AT G 
AGTAAGTTTAAAACTATTCGACTTGGACGAAAGGATTCTGAATTTAAGGG 
AC AACT TAT T G C AAAAACT GG AAC AG T T GAAT TAG AT AT GT T T T T C AAAC 
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AAT CT C AAGAC C C AG CT T C AAT TAT T AAAAAAAT AT ACCT TAT C C AAAAT 
GGTGTTCCAAATGAATTGAAAAAATTTGACTCTAGTTTTGGTTTAACTGA 
AAGTCAGATAGAT GGAT ACT AT AT T T AT AAAGAT GCAAT T AACCTTAAAT 
TTAAATTAACCAGTGGTGCAaGTCTTAAAGTTGTTTATAAAGGGCAAGAA 
GAT CC AT AT AGT CAT C AGAAAG AAGAT AT G AC T AAAAr AGGT GAAC AGCT 
C AGT CAT T C AAC T C AAG C C AAT GAAAAT AC AGC AAAAGT AAC C T T T G CT A 
AT AT T GAC T GGT C AC AT TAT AG T AAGGT T ACT GT GAAT G GAAAAGAAGT T 
GGT AAAGG T AGT G AG T T AC CT T T AACT AAAGGAT GGAC AAC AT T T GT AT T 
ACAT AAAAC AG AAAAT T CAT T AAAT G T T AAAAGT T T GAT TAT GGAG ACGG 
GT AGT GT AAGT AAGAAAG T T C AAC AACT T C C T T T AAGT C C TAG AT TAT CT 
AAAAAT AAG CAT AT G AGGG AT AT G CT AC T T AC TAT G C AAAAAG AT T C AGC 
GTATTACGAAACAAGTGACAGTCTAGTCCTTCGAATTAATCTCACTGCAG 
ATACTAAACTTAATTTTAATGCTGTTAAAGGAGCGAGTGCTCTTACTGAA 
AAT AT GAT GAT GAG AC AGT T T G C AGT T G C T GGAC C AC AAGAT GAT CC T GT 
TAG T GAAC AT AAAT AC C CAT C AGT ATT T CT C T T AAC T C CT GC C T TAT T GG 
AAACTGCTAGTGAGGCAACTCTAAATGGTAAGGAAATCACAGCATCTGGT 
AT TAT C G GT C AC AT C AAG GAT GGT GAT AAAAG C AAG CAT GT T G AAGT C AA 
AATGGTGAATGAAAATGGAGACATGCTAGGAACCCCTGTTATTATTCAAG 
GT AAAGACT T GAC T AAT C GAAC AAAAC C ATT AAT G AGT GGAC GTAGAGT A 
C T T TAT G C C GG TAAAC AAT AT G AGT T C CGG GC T AAAT T AC C AC T T AGT CG 
T T T T AACACT T G GAT TAG G GT T G AAG T GGT AAC AGAAG C AG GAg a G a a a g 
cAaGTATTGTTCGTCGCATGTTCTTTGACCAATCAGTTCCAGAGCTTAAC 
ACAGCAGTTGCTAAACGTGATTTGACTTCTGATACTGCTCTTATCCACAT 
C GT T G C C AAAGAT G AC T C T CT AAAAC T AAAAT TAT AT C AAGAT GAT T CAT 
TACTTGAATCTGTTGATAAAACCGGTCTTTATAGTTTTAGAAATGGTGTA 
GAAAT CACT AAAGAT AT GAC AGT AC C AC TAG AAT T T GGAG AT AAT AT TAT 
TAAGTTATCTGCTGTTGACTTATCAAATTATCGTCGTAATGAGACCCTTC 
AT AT C T AT AG AAAC C GT T T T G AT GT T AAAG C AAG C C AAAT GAC AGC T GAC 
AAAGGAG C T AAAGT AACT GT G G AT AT GT T GAT G AAG C ACT T AGT T G T T C C 
AGAAAT GG C AG G AG CT TAT AC AT T AAC AAT C GAC G AAGC T C C AAAC AC AA 
AT GAAT CAGGAAT GTTAACAAACGCT AAAGT ATCGAT T CATT AT GTAAAT 
GGTGGTGTTGATAAAGTTGATGTTCCGATTAAAGTAGTTGACTTAGAAGC 
TAT T C GT AAAG C T G AAG AAG C AC AT AAAG CT G AC GAAGC AC GT AAAG CT G 
AAGAAGC AC GT AAAG C T G AAG AAGC AC AT AAAGC T G AAG AAG T AC GT AAA 
GCTGAAGAAGCACATAAAGTCGAAGAAGCACCGTAAAGCTGAAGAGGGAC 
AT AAAAC C C AAG AAG C AC C TAT AG T T G AAG AAGG C T AC AAG GT T AAT AAC 
GTTCATCAAACTGATACTACAGTTAAAGCGTCTGATTTACCAAAGACTAA 
GAC AGT T T C C G C AG T T CAT AT G G CT AG AAC AGAC AAT AAAC AG AT AAC T T 
CACATCAGAC ACATGT TG 

SEQ ID NO. 5110 
STRAIN 2 603 frame: 1 

LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEAPKDSKKDAIEDKSGAIKYAKSLQLSFVDGPILASKV 
NGKILQVESDGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAV 
DYLSNTTFEQLATGETVDYHAIVFSSFAAIKDKGGKIYVNDKLQETSRIALKDKSVKIGI 
ELPNDVRHIDSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLE FTINNINS 
SSEIMTTFKDGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELD 
MFFKQSQDPASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSG 
AS LKVVYKGQE DP YSHQKE DMTKKGEQL SHSTQANENTAKVT FAN I DWSHYS KVT VNGKE 
WKGSELPLTKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDML 
LTMQKDSAYYETSDSLVLRINLTADTKLNFNAVKGASALTENMMMRQFAVAGPQDDPVSE 
HKYPSVFLLTPALLETASEATLNGKEITASGIIGHIKDGDKSKHVEVKMVNENGDMLGTP 
V 1 1 QGK D LTN RT K P LM S GRR VL Y AGKQ YE FRAKL P L S R FN T W I R VE V VT E AG E KA S I VRR 
MFFDQSVPELNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNG 
VE I T K DMT V P LE FG DN 1 1 KL S AV DL S N YRRNET LH I YRNR FD VKAS QMT ADKGAKVT VDM 
LMKHL W PE MAG A Y TLTIDEAPNTNES GM L TN AK V S I H Y VN G G V D KV D V P I K VV D LE A I R 
KAEEARKAEEARKAEEARKAEEGHKTQEAPIVEEGYKVNNVHQTDTTVKASDLPKTKTVS 
AVHMARTDNKQITSHQTHVEKQIKNTLPSTGDSKRGYYITGMAIVMLSVLFSLAKKFKSK 
Y 

SEQ ID NO. 5111 

STRAIN A909 frame: 1 
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LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPPPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEALKDSKKDAIEDKSGAIKYAKSLQLSFVDDPILASKV 
NGKILQVESDGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAV 
DYLSNTTFEQLATGETVDYHAIVFSSFAAIKDKGGKIYVNDKLQETSRIALKDKSVKIGI 
ELPNDVRHIDSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLEFTINNINS 
SSEIMTTFKDGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELD 
MFFKQSQDPASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSG 
ASLKWYKGQEDPYSHQKEDMTKKGEQLSHSTQANENTAKVTFANIDWSHYSKVTVNGKE 
VGKGSELPLTKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDML 
LTMQKDSAYYETSDSLVLRINLTADTKLNFNAVKGASALTENMMMRQFAVAGPQDDPVSE 
HKYPSVFLLT PALLET ASEATLNGKEITASG I IGHIKDGDKSKHVEVJCMVNENGDMLGTP 
VIIQGKDLTNRTKPLMSGRRVLYAGKQYEFRAKLPLSRFNTWIRVEWTEAGEKASIVRR 
MFFDQSVPELNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNG 
VEITKDMTVPLEFGDNIIKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDM 
LMKHLWPEMAGAYTLTIDEDPNTNESGMLTNAKVSIHYVNGGVDKVDVPIKVVDLEAIR 
KAEEAHKADEARKAEEARKAEEARKAEEARKAEEGHKTQEAPIVEEGYKVNNVHQTDTTV 
KASDLPKTKTVSAVHMARTDNKQITSHQTHVEKQIKN 

SEQ ID NO. 5112 

STRAIN H3 6B frame: 2 

GVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVYLQSVKYVGG 
GNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTKPSKPKDSLS 
TPPGFPDLNTPPDEALKDSKKDAIEDKSGAIKYAKSLQLSFVDDPILASKVNGKILQVES 
DGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAVDYLSNTTFE 
QLATGETVDYHAIVFSSFAAIKDKGGKIYVNDKLQETSRIALKDKSVKIGIELPNDVRHI 
DSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLEFTINNINSSSEIMTTFK 
DGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELDMFFKQSQDP 
ASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSGASLKWYKG 
QEDPYSHQKEDMTKKGEQLSHSTQANENTAKVTFANIDWSHYSKVTVNGKEVGKGSELPL 
TKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDMLLTMQKDSAY 
YETSDSLVLRINLTADTKLNFNAVKGASALTENMMMRQFAVAGPQDDPVSEHKYPSVFLL 
TPALLETASEATLNGKEITASGIIGHIKDGDKSKHVEVKMVNENGDMLGTPVIIQGKDLT 
NRTKPLMSGRRVLYAGKQYEFRAKLPLSRFNTWIRVEWTEAGEKASIVRRMFFDQSVPE 
LNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNGVEITKDMTV 
PLEFGDNITKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDMLMKHLVVPE 
MAGAYTLTIDEAPNTNESGMLTNAKVSIHYVNGGVDKVDVPIKWDLEAIRKAEEAHKAD 
EARKAEEARKADEAHKAEEVRKAEEAHKVEEARKAEEGHKTQEAPIVEEGYECVNNVHQTD 
TTVKASDLPKTKTVSAVHMARTDNKQITSHQTH 

SEQ ID NO. 5113 

STRAIN 18RS21 frame: 1 

LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEAPKDSKKDAIEDKSGAIKYAKSLQLSFVDDPILASKV 
NGKILQVESDGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAV 
DYLSNTT FEQLATGETVD YHAI VFS S FAAIKDKGGKI YVNDKLQET SRI ALKDKS VKI GI 
ELPNDVRHIDSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLEFTINNINS 
SSEIMTTFKDGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELD 
MFFKQSQDPASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSG 
ASLKWYKGQEDPYSHQKEDMTKKGEQLSHSTQANENTABCVTFANIDWSHYSKVTVNGKE 
WKGSELPLTKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDML 
LTMQKD S AYYET S D S LVLR INLT ADTKLN FNAVKGAS ALTENMMMRQFAVAG PQDD PVSE 
HKYPSVFLLTPALLETASEATLNGKEITASGIIGHIKDGDKSKHVEVKMVNENGDMLGTP 
VII QGK DLTNRT KP LMS GRRVL YAGKQ YE FRAKL P L S R FNTW I RVE VVT E AGE KAS I VRR 
MFFDQSVPELNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNG 
VEITKDMTVPLEFGDNIIKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDM 
LMKHLVVPEMAGAYTLTIDEAPNTNESGMLTNAKVSIHYVNGGVDKVDVPIKVVDLEAIR 
KAEEARKAEEARKAEEGHKTQEAPIVEEGYKVNNVHQTDTTVKASDLPKTKTVSAVHMAR 
TDNKQITSHQTHVE 

SEQ ID NO. 5114 

STRAIN M7 32 frame: 1 

LNNKGVGGDGVQI YQYYIKMDNNKPYLS PKDKTTVEKLEDRWKKIT FKVQDTGIGLKDVY 
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LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEATKG. . KRRY . R. IRSN . IC . VSST . LC . .PYFS.QS 
KWQNITSRI . WQISHS . KCFVS . SI . . H . S . NLS . . . SQ . RNYYHNRLFCRYKICQYHSG 
. LFEQYYF . AI S YW . NSRLPCHCI FKLCCY . RQGW . DLC . R . IARNFS YSA . R . IC . DWY 
.ITK.CQTY. .FICSSFE.G.NC. . YLEK . . TRH . SQQNLPIKIQPDKSSSRVYY . . H. L 
KFRNHDHFQRWKDARIG . TKRC FFGYKRYGHE . V . NYSTWTKGF . I . GTTYCKNWNS . IR 
YVFQTISRPSFNY . KNIPYPKWCSK . IEKI . L . FWFN . KSDRWILYL . RCN .P.I. INQW 
CKS . SCL . RARRSI . SSERRYD . KR . TAQSFNSSQ . KYSKSNLC . Y . LVTL . . GYCEWKR 
SW . R . . VTFN . RMDNICIT . NRKFIKC . KFDYGDG . CK . ESSTTSFKS .U.K. AYEGYA 
TYYAKRFS VLRNK . QSSPSN . SHCRY . T . F . CC . RSECSY . KYDDETVCSCWTTR . SC . . 
T.IPISISLNSCLIGNC. . GNSKW . GNHSIWYYRSHQGW . . KQAC . SQNGE . KWRHARNP 
CYYSR . RLD . SNKTINEWT . STLCR . TI . VPG . ITT . SF . HLD . G . SGNRSRRESKYCSS 
HVL . PISSRA. HSSC . T . FDF . YCSYPHRCQR. LSKTKIISR . FIT . IC . . NRSL . F . KW 
CRNH . RYDSTTRIWR . YY. VICC . LIKLSS . . DPSYL . KPF . C . SKPNDS . QRS . SNCGY 
VDEALSCSRNGRSLYINNRRSSKHK. IRNVNKR . SIDSLCKWWC . . S . CSD . SS . LRSYS 
.S.RST.S.RST.S.RST.S.RST.S.RST.S.RST.SRRST.S.RGT.NPRSTYS.RRL 
QS . . RSSN . YYS . SV . FTKD . DSFRSSYG . NRQ . TDNFTSDTC . K 

SEQ ID NO. 5115 

STRAIN COH1 frame: 1 

LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEATKG. . KRRY. R . IRSN . IC . VSST . LC . .PYFS.QS 
KWQNITSRI. WQISHS. KCFVS. SI. .H.S.NLS. . . SQ . RNYYHNRLFCRYKICQYHSG 
. LFEQYYF. AISYW. NSRLPCHCI FKLCCY. RQGW. DLC. R. IARNFS YS A. R . IC . DWY 
.ITK.CQTY. .FICSSFE.G.NC. .YLEK. . TRH . SQQNLPIKIQPDKSSSRVYY . .H.L 
KFRNHDHFQRWKDARIG . TKRC FFGYKRYGHE . V . NYSTWTKGF . I . GTTYCKNWNS . IR 
YVFQTISRPSFNY . KNIPYPKWCSK. IEKI . L . FWFN . KSDRWILYL . RCN .P.I. INQW 
CKS . SCL . RARRSI . SSERRYD . KR . TAQSFNSSQ . KYSKSNLC . Y . LVTL . . GYCEWKR 
SW . R . . VTFN . RMDNICIT . NRKFIKC . KFDYGDG . CK . ESSTTSFKS .U.K. AYEGYA 
TYYAKRFS VLRNK . QSSPSN . SHCRY . T . F . CC . RSECSY . KYDDETVCSCWTTR . SC . . 
T.IPISISLNSCLIGNC. . GNSKW . GNHSIWYYRSHQGW . . KQAC . SQNGE . KWRHARNP 
CYYSR . RLD . SNKTINEWT . STLCR . TI . VPG .ITT . SF. HLD . G . SGNRSRRESKYCSS 
HVL . PISSRA. HSSC . T . FDF . YCSYPHRCQR . LSKTKIISR . FIT . IC . . NRSL . F. KW 
CRNH . RYDSTTRIWR . YY . VICC . LIKLSS . . DPSYL . KPF . C . SKPNDS . QRS . SNCGY 
VDEALSCSRNGRSLYINNRRSSKHK. IRNVNKR. SIDSLCKWWC . . S . CSD . SS . LRSYS 
.S.RST.S.RST.S.RST.S.RST.S.RST.S.RST.SRRST.S.RGT.NPRSTYS.RRL 
QS . . RSSN . YYS . SV . FTKD . DSFRSSYG . NRQ . TDNFTSDTC 

SEQ ID NO. 5116 

STRAIN M781 frame: 1 

LNNKGVGGDGVQIYQYYIKMDNNKPYLSPKDKTTVEKLEDRWKKITFKVQDTGIGLKDVY 
LQSVKYVGGGNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTK 
PSKPKDSLSTPPGFPDLNTPPDEATKG. . KRRY . R . IRSN . IC . VSST . LC . .PYFS.QS 
KWQNITSRI. WQISHS. KCFVS. SI. .H.S.NLS. . . SQ . RNYYHNRLFCRYKICQYHSG 
. LFEQYYF . AI SYW . NSRLPCHCI FKLCCY . RQGW . DLC . R . IARNFSYSA . R . IC . DWY 
.ITK.CQTY. .FICSSFE.G.NC. . YLEK. . TRH . SQQNLPIKIQPDKSSSRVYY . .H.L 
KFRNHDHFQRWKDARIG . TKRC FFGYKRYGHE . V . NYSTWTKGF . I . GTTYCKNWNS . IR 
YVFQTISRPSFNY . KNIPYPKWCSK . IEKI . L . FWFN . KSDRWILYL . RCN .P.I. INQW 
CKS . SCL . RARRSI . SSERRYD . KR. TAQSFNSSQ . KYSKSNLC . Y . LVTL . . GYCEWKR 
SW . R . . VTFN . RMDNICIT . NRKFIKC . KFDYGDG . CK . ESSTTSFKS .U.K. AYEGYA 
TYYAKRFS VLRNK . QSS PSN . SHCRY . T . F . CC . RSECSY . KYDDETVCSCWTTR . SC . . 
T.IPISISLNSCLIGNC. . GNSKW . GNHSIWYYRSHQGW . . KQAC . SQNGE . KWRHARNP 
CYYSR . RLD . SNKTINEWT . STLCR . TI . VPG . ITT . SF. HLD . G . SGNRSRRESKYCSS 
HVL . PI SSRA . HSSC . T . FDF . YCSYPHRCQR . LSKTKIISR . FIT . IC . . NRSL . F. KW 
CRNH . RYDSTTRIWR . YY . VICC . LIKLSS . . DPSYL . KPF . C . SKPNDS . QRS . SNCGY 
VDEALSCSRNGRSLYINNRRSSKHK . IRNVNKR . SIDSLCKWWC . . S . CSD . SS . LRSYS 
. S . RST . S . RST . S . RST . S . RST . S . RST . S . RST . SRRSTVKLKRDIKPKKHL . LKKA 
TKLITFIKLILQLKRLIYQRLRQFPQFIWLEQTINR . LHIRHML 

SEQ ID NO. 5117 

STRAIN JM9130013 frame: 2 

G VQ I YQYY I KMDNNKPYLS PKDKTT VEKLE DRWKK I T FKVQDTG I GLKDVYLQS VKYVGG 
GNNNLDLITPPGFKKEDKKVEKPKLDRPPGIDLPAPTSMRSFDYSTPPGTKPSKPKDSLS 
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TPPGFPDLNTPPDEAPKDSKKDAIEDKSGAIKYAKSLQLSFVDDPILASPCVNGKILQVES 

DGKLVIPRNALSANQFDDTSLKIYRNNNRNKEITITTDYFADTKYVNITAVDYLSNTTFE 

QLATGETVDYHAIVFSSFAAIKDKGGKIYVNDKLQETSRIALKDKSVKIGIELPNDVRHI 

DSLSVRRLNEVKTVDNILKNDEQDINLSKTYQLKYNPTNRRLEFTINNINSSSEIMTTFK 

DGKMPELVEQKDVSLDINDMDMSKFKTIRLGRKDSEFKGQLIAKTGTVELDMFFKQSQDP 

ASIIKKIYLIQNGVPNELKKFDSSFGLTESQIDGYYIYKDAINLKFKLTSGASLKWYKG 

QEDPYSHQKEDMTKXGEQLSHSTQANENTAKVTFANIDWSHYSKVTVNGKEVGKGSELPL 

TKGWTTFVLHKTENSLNVKSLIMETGSVSKKVQQLPLSPRLSKNKHMRDMLLTMQKDSAY 

YETSDSLVLRINLTADTKLNFNAVKGASALTENMMMRQFAVAGPQDDPVSEHKYPSVFLL 

TPALLETASEATLNGKEITASGIIGHIKDGDKSKHVEVKMVNENGDMLGTPVIIQGKDLT 

NRTKPLMSGRRVLYAGKQYEFRAKLPLSRFNTWIRVEWTEAGEKASIVRRMFFDQSVPE 

LNTAVAKRDLTSDTALIHIVAKDDSLKLKLYQDDSLLESVDKTGLYSFRNGVEITKDMTV 

PLEFGDNIIKLSAVDLSNYRRNETLHIYRNRFDVKASQMTADKGAKVTVDMLMKHLWPE 

MAGAYTLT I DEAPNTNE S GMLTNAKVS I H YVNGG VDKVDVP IKWDLEAI RKAEE AHKAD 

EARKAEEARKAEEAHKAEEVRKAEEAHKVEEAP . S . RGT . NPRSTYS . RRLQG . . RSSN . 

YYS . SV . FTKD . DSFRSSYG . NRQ . TDNFTSDTC 

SEQ ID NO. 5201 
STRAIN 090 

AG C GAT AC C T T T AAT T T T GAT AT T GAG C AAAT T G C AG A 
CAAT GCT AT C ACTAAAAC AG AT AAAAC AAC AG AAAT T AT TT CC AAC C AG A 
C AAC AAG C C AAAC T GGG C AAAT TGCCTTTTTT G AAAAAC T AAC ACC AGC A 
CAAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACTTTTGT 
CGGCGATCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCG 
TT AAT ACCACT GT T AAT CATAT CTTGT CT GAGCAGAAAAAAATT CAAATT 
C C T C AAG T TG AT GAT T T ACT AAAAAAT G C T AAT CG CG AAC TAAAT GGAT T 
TAT T GC C AAAT AT AAAG AT G C T AC T C C G G C AG AAT T Ag AG AAAAAAC C AA 
ACT T GAT T C AAAAAT TAT T C AAAC AAAG C AAGAC CT CG C T AC AGGAAT T T 
TAT T T T G AC T C AC AAAAC AT C GAG C AAAAAAT GGAT AT GAT G GC a G C G AA 
TGTTGTCAAACAAGAAGATACTTTGGCAAGAAATATCGtCTCTGCTGAAA 
T G C T CAT T G AAGAT AAT AC TAAAT C TAT T G AAAAT T T G GT T G G AGT TAT T 
GCTttTATTGAATCgAGTCAAGCCGAGGCTGCTAATCGtGCAaGCCACTT 
ACAACAAGAAATT CT AGCATTAGAT AGC C a AACGT cCGAGT ATCAAAT t A 
AAAGT a AC CAAT TAG C T CGAAT G ACT G AAGT TAT C AAT AC CC T CG AAC AG 
C AAC AT AC T GAAT AT GT C AG C CGT C T C T AC G T T G CAT GGG C AAC AAC AC C 
AC AG AT G C G AAAC T T G GT C AAAGT AT C G T C AG AT AT G C GT C AG AAAC T T G 
G C AT GT T AC G T CG AAAT AC CAT T C C AAC AAT G AAAC T C T CAAT C G C T C AG 
TTAGGCATGATGCAACAATCTGTCAAATCCGGTGTCACTGCTGATGCTAT 
T G T C AACG C T AAT AAT G C AG CAT T G C AG AT G C T G G CT G AAAC T AGT AAAG 
AAG C GAT T C C G AT GT T AGAG AAG AC C G C AC AAAG C C C C AC T GT T T CT AT T 
AAAT C T GT C AC T GC AT T AG C T G AAAG CT TAG T GGC T C AAAAT AAT GG TAT 
TATCGCTGCCATAGACAAAGGACGTAAGGAACGTGCCCaATTGGAATCTG 
CTGTTATTAAATCGGCTGAAACAAT CAAT GAT TCTGTCAAAATT CGT GAT 
AAAAAAAT AGT T G AAG C C T T ACT C AAC GAAG GT a AAT C T AC C C AAG AAAA 
AG T T GAT G AGT C T 

SEQ ID NO. 5202 
STRAIN A909 

AG CGAT AC C T T T AAT T T T GAT AT T G AC C AAAT T G C AG A 
CAAT G CT AT C ACT AAAAC AG AT AAAAC AAC AG AAAT T ATT T CC AAC C AG A 
CAACAAGCCAAACTGGGCAAATTGCCTTTTTTGAAAAACTAACACCAGCA 
CAAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACTTTTGT 
CGGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCG 
TT AAT AC CACT GT T AAT CAT AT CTT GT CT GAG C AG AAAAAAAT T CAAATT 
C CT CAAGTTGAT GAT T TACT AAAAAAT GCT AAT CGCGAACTAAATGGATT 
TAT T G C C AAAT AT AAAG AT G C T ACT C C G G C AG AAT TAG AG AAAAAAC CAA 
ACT T GAT T C AAAAAT TAT T C AAAC AAAGC AAG AC C T C G C T AC AG GAAT T T 
TAT T TTGACT C ACAAAAC AT CGAGC AAAAAATGGAT AT GAT GGCAG CGAA 
T GT T G T C AAAC AAG AAG AT AC T T T G G C AAG AAAT AT CGTCTCTGCT G AAA 
TGCTCATTGAAGATAATACTAAATCTATTGAAAATTTGGTTGGAGTTAwT 
GCTTTTATTGAATCGAGTCAAGCCGAGGCTGCCAATCGTGCAAGCCACTT 
AC AAC AAG AAAT T C T AG CAT TAG AT AG C C AAAC GT C C G AGT AT C AAAT T A 
AAAGT AAC CAAT TAG CT C GAAT G AC T GAAG T TAT CAAT AC C CT C G AAC AG 
C AAC AT ACT GAAT AT GT C AG C C G T C T C T AC GT T G CAT G GGC AAC AAC AC C 
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AC AGAT G CGAAAC T T G GT C AAAGT AT C GT C AG AT AT G C GT C AAAAACT T G 
GCATGTTACGTCGAAATACCATTCCAACaATGAAACTCTCAATCGCTCAG 
TTAGGCATGATGCAACAATCTGTCAAATCCGGTGTCACTGCTGATGCTAT 
T GT C AACGC T AAT AAT G C AG CAT T G C AG AT G C T G G C T G AAAC T AGT AAAG 
AAGCGATTCCGATGTTAGAGAAGACCGCACAAAGCCCCACTGTTTCTATT 
AAAT CT GT C ACT G CAT TAG C T G AAAG C T T AGT GG C T C AAAAT AAT G GT AT 
TATCGCTGCCATAGACAAAGGACGTAAAGAACGTGCCCAATTAGAATCTG 
CTGTTATTAAATCGGCTGAAACAATCAATGATTCTGTCAAAATTCGTGAT 
AAAAAAAT AG T T G AAG C CT T ACT C AAC G AAG GT a AAT CT AC C C AAG AAAA 
AGtTGATGAGTCT 

SEQ ID NO. 5203 
STRAIN H36B 

AGCGaTACCTTTAATTTTGATATTGACCAAATTGCAGAC 
AAT G C T AT C ACT AAAAC AG AT AAAAC AAC AGAAAT T AT T T C C AAC C AG AC 
AAC AAG C C AAAC T GG GC AAAT TGCCTTTTTT G AAAAACT AAC AC C AG C AC 
AAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACTTTTGTC 
GGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCGT 
T AAT AC C ACT G T T AAT CAT AT C T T GT C T GAG C AGAAAAAAAT T C AAAT T C 
CT C AAGT T GAT GAT T T ACT AAAAAAT G CT AAT CGC G AAC T AAAT GG AT T T 
AT T G C C AAAT AT AAAGAT G C T AC T C C G G C AGAAT T AG AG AAAAAAC C AAA 
C T T GAT T C AAAAAT TAT T C AAAC AAAG C AAG AC C T C G CT AC AGG AAT T T T 
AT T T T G AC T C AC AAAAC AT C GAG C AAAAAAT G GAT AT GAT G G C AGCG AAT 
GTTGTCAAACAAGAAGATACTTTGGCAAGAAATATCGTcTCTGCTGAAAT 
GCTCATTGAAGATAATACT AAAT CT ATT GAAAATTTGGTTGGAGTTATTG 
CTttTATTGAATCGAGTCAAGCCGAgGCTGCCAATCGTGCAAGCCACTTA 
C AAC AAG AAAT TCTAGC ATT AGAT AGCCAAACGTcCG AGT AT CAAATTAA 
AAGT AAC C AAT TAG C T C G AAT G AC T G AAGT TAT C AAT AC C C T CGAAC AGC 
AACATACTGAATATGTCAGCCGTCTCTACGTTGCATGGGCAACAACACCA 
CAGATGCGAAACTTGGTCAAAGTATCGTCAGATATGCGTCAAAAACTTGG 
CAT GT T ACGT C G AAAT AC CAT T C C AAC a AT G AAAC T C T C AAT C G CT C AGT 
TAGGCATGATGCAACAATCTGTCAAATCCGGTGTCACTGCTGATGCTATT 
GT C AAC G C T AAT AAT G C AG CAT T G C AG AT G C T GGC T G AAAC T AGT AAAG A 
AG C GAT T C CG AT G T T AG AG AAGAC C GC AC AAAG C C C C AC T GT T T C T AT T A 
AATCTGTCACTGCATTATCTGAAAGCTTAGTGGCTCAAAATAATGGTATT 
AT C G CT G C CAT AG AC AAAG GAC GT AAAG AACGT G C C C AAT TAG AAT C T GC 
TGTTATTAAATCGGCTGAAACAATCAATGATTCTGTCAAAATTCGTGATa 
AAAAAAT AG T T G AAG C CT T AC T C Aa C G AAG GT a AAT C T AC C C AAG AAAAA 
GT T GAT G AGT C T 

SEQ ID NO. 5204 
STRAIN 18RS21 

T T T T GAT AT T G AC C AAAT T G C AG AC AAT GC TAT C AC T AAAAC AGAT AAAA 
C AAC AGAAAT TAT T T C C AAC C AG AC AAC AAG C C AAAC T G G GC AAAT T G C C 
TTTTTTGAAAAACTAACACCAGCACAAAAGTCTGCTATCTCTGAAAAAAC 
ACCAGCTTTGGTAGATACTTTTGTCGGCGATCAAAATGCGCTCCTTGATT 
T T G GAC AAT C C GC AGT AG AAG G CG T T AAT ACC AC T G T T AAT CAT AT CT T G 
T C T GAG C AG AAAAAAAT T C AAAT T C C T C AAGT T GAT GAT T T ACT AAAAAA 
T G CT AAT CG C G AAC T AAAT GG AT T T AT T GC C AAAT AT AAAG AT G C T AC T C 
CGGCAGAATTAGAGAAAAAACCAAACTTGATTCAAAAATTATTCAAACAA 
AG C AAG AC CT C G C T AC AGG AAT T T T ATT T T G ACT C AC AAAAC AT C GAG C A 
AAAAATGGATATGATGGCAGCGAATGTTGTCAAACAAGAAGATACTTTGG 
CAAG AAAT AT C GT CT CTGCTGAAATGCT CAT T GAAGAT AAT ACT AAAT CT 
ATTGAAAATTTGGTTGGAGTTATTGCTTTTATTGAATCGAGTCAAGCCGA 
GG C T G C T AAT C GT G CAAG C C AC T T AC AAC AAG AAAT T C TAG CAT TAG AT A 
G C C AAAC GT C C G AGT AT C AAAT T AAAAG T AAC C AAT TAG CT C G AAT GAC T 
G AAGT T AT C AAT AC C C T C G AAC AG C AAC AT C CT G AAT AT G T C AG C C G T CT 
C T AC G T T G C AT GG G C AAC AAC AC C AC AG AT G C G AAAC T T G G T C AAAG TAT 
CGTCAGATATGCGTCAGAAACTTGGCATGTTACGTCGAAATACCATTCCA 
ACAATGAAACTCTCAATCGCTCAGTTAGGCATGATGCAACAATCTGTCAA 
ATCCGGTGTCACTGCTGATGCTATTGTCAACGCTAATAATGCAGCATTGC 
AGAT G C T G G C T G AAAC TAG T AAAG AAG C GAT T C C GAT G T TAG AG AAG AC C 
GC ACAAAGCCCC ACT GTTTCT ATT AAAT CTGTC ACT GC ATT AGCTG AAAG 
C T T AGT G GC T C AAAAT AAT GGT AT TAT C GC T G C CAT AG AC AAAG G AC GT A 
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AG GAACGT GC C C a AT T GG AAT C T G CT GT TAT T AAAT C GGCT G AAAC AAT C 
AAT GAT T CT G T C AAAAT T C GT G AT AAAAAAAT AGT T GAAGC C T T AC T C AA 
C GAAGGT a AAT CT AC C C AAG AAAAAGT T GAT GAG T C T 

SEQ ID NO. 5205 
STRAIN M732 

AGCG AT AC CT T T AAT T T T GAT AT T G AC C AAAT T G C AG AC 
AAT G C TAT C ACT AAAAC AG AT AAAAC AAC AG AAAT TAT T T C C AAC C AG AC 
AAC AAG C C AAAC T GG G C AAAT TGCCTTTTTT GAAAAAC T AAC AC C AG C AC 
AAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACTTTTGTC 
GGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCGT 
T AAT AC T AC TGT T AAT CAT AT CT T GT C T GAG C AGAAAAAAAT T C AAAT T C 
C T C AAGT T GAT GAT T T AC T AAAAAAT GCT AAT C G C G AAC T AAAT GG AT T T 
AT T G C C AAAT AT AAAG AT G CT AC T C C G G C AGAAT T AGAG AAAAAAC C AAA 
C T T GAT T C AAAAAT TAT T C AAAC AAAG C AAG AC C T C G CT AC AG G AAT T T T 
AT T T T GAC T C AC AAAAC AT C GAG C AAAAAAT G GAT AT G AT GG C AG C AAAT 
GT T GT C AAAC AAG AAG AT ACT T T GG C AAG AAAT AT CGTCTCTGCT G AAAT 
GCTCATTGAAGATAATACTAAATCTATTGAAAATTTGGTTGGAGTTATTG 
CTTTTATTGAATCGAGTCAAGCCGAGGCTGCCAATCGTGCAAGCCACTTA 
C AAC AAGAAAT T CT AG CAT T AGAT AG C C AAAC GT C C GAAT AT C AAAT T AA 
AAGT AAC C AAT TAG C C CGAAT G AC T G AAGT TAT C AAT ACC C T C G AAC AGC 
AAC AT ACGG AAT AT GT C AG C C GT C T C T AC GT T G C AT GG GC AAC AAC AC C A 
C AG AT G C G AAAC T T G G T C AAAGT AT C GT C AG AT AT GC GT C AG AAAC T T GG 
TAT GT T AC GT C G AAAT AC CAT T C C AAC AAT G AAAC T C T C AAT CGC T C AGT 
T AGG CAT GAT G C AAC AAT C T GT C AAAT C C GGT GT C AC T G CT G AT G CT AT T 
G T C AAC G C T AAT AAT G C AG CAT T G C AAAT GC T GGCT G AAAC T AGT AAAG A 
AGCGATTCCGATGTTAGAGAAGACCGCACAAAGCCCCACTGTTTCTATTA 
AATCTGTCACTGCATTAGCTGAAAGCTTAGTGGCTCAAAATAATGGTATT 
AT CG CT GC CAT AG AC AAAGG AC GT AAG G AAC G T G C C C AAT TAG AAT C T G C 
TGTTATTAAATCGGCTGAAACAATCAATGATTCTGTCAAAATTCGTGATA 
AAAAAAT AGT T GAAG C C T T AC T C AAC G AAG GT AAAT C T AC C C AAG AAAAA 
G 

SEQ ID NO. 5206 
STRAIN COH1 

C T AAAAC AG AT AAAAC AAC AG AAAT TAT T T C C AAC C AGAC AAC AAG C C AA 
AC T G G G C AAAT TGCCTTTTTT G AAAAACT AAC AC C AG C AC AAAAGT C T G C 
T wT CT CTG AAAAAAC ACC AGCTTT GGT AGAT ACT TTTGTCGGTG ACC AAA 
ATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCGTTAATACTACT 
GT T AAT CAT AT CT TGTCTGAGC AGAAAAAAAT TC AAAT TCCTCAAGTTG A 
T GAT T TACT AAAAAAT G CT AAT CGC G AAC T AAAT G GAT T TAT T G C C AAAT 
AT AAAGAT G C T AC T C CGG C a GAAT TAG AG AAAAAAC C AAAC T T GAT T C AA 
AAAT TAT T C AAAC AAAG C AAG AC C T CG CT AC AGG AAT T T TAT T T T GAC T C 
AC AAAAC AT C GAG C AAAAAAT GG AT AT GAT GG C AG C AAAT GT T GT C AAAC 
AAGAAGAT ACT T T GGCAAG AAAT AT CGT CT CT GCT GAAATGCT C AT T GAA 
GATAATACTAAATCTATTGAAAATTTGGTTGGAGTTATTGCTTTTATTGA 
AT C G AGT C AAG C CGAgG C T GC C AAT C G T G C a AGC C ACT T AC AAC Aa G AAA 
T T CT AG C a T T AGAT AG C C AAACGT C C GAAT AT C AAAT T AAAAGT AAC C AA 
T TAG C C C GAAT GAC T GAa GT TAT C Aa T a C C C T C G AAC AG C AAC AT AC G G A 
a T AT GT C AG C C G T CT C T AC GT T GC AT GGG C AAC AAC AC C AC AG AT G CG AA 
ACTTGGTCAAAGTATCGTCAGATATGCGTCAGAAACTTGGTATGTTACGT 
CGAAATACCATTCCAACAATGAAACTCTCAATCGCTCAGTTAGGCATGAT 
GCAACAATCTGTCAAATCCGGTGTCACTGCTGATGCTATTGTCAACGCTA 
AT AAT GC AG C AT T G C AAAT G C T GGC T G AAACT AGT AAAG AAG CG AT T C C G 
AT G T TAG AG AAG AC C G C AC AAAG C C C C ACT GT T T C TAT T AAAT C T G T C AC 
TGCATTAGCTGAAAGCTTAGTGGCTCAAAATAATGGTATTATCGCTGCCA 
TAGACAAAGGACGTAAGGAACGTGCCCAATTAGAATCTGCTGTTATTAAA 
T CGGCT GAAAC AAT C AAT GATT CTGT CAAAATTCGTGAT AAAAAAAT AGT 
T GAAG C C T T AC T C Aa C GAAG GT AAAT C T AC C C AAG AAAAAG T T GAT G AG T 
CT 

SEQ ID NO. 5207 
STRAIN M781 

T T T T GAT AT T GAC C AAAT T G C AG AC AAT G CT AT C AC T AAAAC AG AT AAAA 
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C AAC AG AAAT TAT T T C C AACC AG AC AAC AAG C C AAAC T GGGC AAAT T GC C 
TTTTTTGAAAAACTAACACCAGCACAAAAGTCTGCTATCTCTGAAAAAAC 
ACCAGCTTTGGTAGATACTTTTGTCGGTGACCAAAATGCGCTCCTTGATT 
T T GG AC AAT C C G C AGT AG AAG G C G T T AAT AC TAG T G t T AAT CAT AT C T T G 
T CT G AG C AG AAAAAAAT T C AAAT T C C T C AAG T T GAT GAT T T ACT AAAAAA 
TGCTAATCGCGAACTAAATGGATTTATTGCCAAATATAAAGATGCTACTC 
C GG C AGAAT T AGAGAAAAAAC C AAACT T GAT T C AAAAAT TAT T C AAAC AA 
AG C AAG AC CT C G CT AC AG G AAT T T TAT T T T G AC T C AC AAAAC AT C GAG C A 
AAAAAT G GAT AT GAT GG C AG C AAAT GT T GT C AAAC AAG AAG AT ACT T T GG 
C AAG AAAT AT CGTCTCTGCT G AAAT GC T CAT T GAAGAT AAT ACT AAAT C T 
ATTGAAAATTTGGTTGGAGTTATTGCTTTTATTGAATCGAGTCAAGCCGA 
G GC T GC C AAT C G T G C AAG C C AC T T AC AAC AAGAAAT T C TAG CAT TAG AT A 
G C C AAACGT C C G AAT AT C AAAT T AAAAGT AAC C AAT TAG C C CG AAT G ACT 
G AAGT T AT C AAT AC C C T C GAAC AG C AAC AT ACGG AAT AT GT C AGC C GT C T 
C T AC GT T G CAT G GG C AAC AAC AC C AC AG AT G C G AAAC T TG GT C AAAGT AT 
C GT C AG AT AT G C GT C AG AAAC T T GGT AT GT T AC GT C G AAAT AC CAT T C C A 
ACAATGAAACTCTCAATCGCTCAGTTAGGCATGATGCAACAATCTGTCAA 
ATCCGGTGTCACTGCTGATGCTATTGTCAACGCTAATAATGCAGCATTGC 
AAAT GC T GG C T GAAAC T AGT AAAGAAG C GAT T C C GAT GT T AG AGAAG AC C 
GCACAAAGCCCCACTGTTTCTATTAAATCTGTCACTGCATTAGCTGAAAG 
CTTAGTGGCTCAAAATAATGGTATTATCGCTGCCATAGACAAAGGACGTA 
AGGAACGTGCCCAATTAGAATCTGCTGTTATTAAATCGGCTGAAACAATC 
AAT GAT T C T GT C AAAAT T C GT GAT AAAAAAAT AGT T G AAGC C T T AC T C AA 
CGAAGGTAAATCTACCCAAGAAAAAGTTGATGAGTCT 

SEQ ID NO. 5208 
STRAIN CJB110 

T T T T GAT AT T G AC C AAAT T G C AG AC AAT G CT AT C AC T AAAAC AG AT AAAA 
C AAC AG AAAT TAT T T C C AAC C AG AC AAC AAG C C AAAC T G GGC AAAT T G C C 
TTTTTTGAAAAACTAACACCAGCACAAAAGTCTGCTATCTCTGAAAAAAC 
ACCAGCTTTGGTAGATACTTTTGTCGGCGATCAAAATGCGCTCCTTGATT 
T T G GAC AAT C C G C AG T AG AAG GC GT T AAT AC C AC T G T T AAT CAT AT C T T G 
T C T GAG C AG AAAAAAAT T C AAAT T C C T C AAG T T GAT GAT T T AC T AAAAAA 
TGCTAATCGCGAACTAAATGGATTTATTGCCAAATATAAAGATGCTACTC 
CGGCAGAATTAGAGAAAAAACCAAACTTGATTCAAAAATTATTCAAACAA 
AG C AAG AC C T C G C T AC AG G AAT T T TAT T T T GAC T C AC AAAAC AT CG AG C A 
AAAAAT GG AT AT GAT GG C AGC G AAT GT T GT C AAAC AAG AAG AT ACT T T G G 
C AAGAAAT AT CGTCTCTGCT G AAAT G C T CAT T GAAGAT AAT ACT AAAT CT 
AT T G AAAAT T T GGT T GG AGT TAT T G C T T TT AT T G AAT CG AGT C AAG C CG A 
GG CT G C T AAT C GT G C AAG C C AC T T AC AAC AAG AAAT T C T AGC AT TAG AT A 
G C C AAAC GT C CG AGT AT C AAAT T AAAAG T AAC C AAT T AG C T C G AAT GAC T 
G AAG T T AT C AAT AC C CT C GAAC AG C Aa CAT AC T G AAT AT G T C AG C C GT C T 
CTACGTTGCATGGGCaACaACACCACAGATGCGAAACTTGGTCAAAGTAT 
C G T C AG AT AT G C G T C AG AAAC T T G G CAT GT T AC GT C G AAAT AC CAT T C C A 
AC AAT GAAAC T CT C AAT C GC T C AG T T AG G CAT GAT G C AAC AAT C T GT C AA 
ATCCGGTGTCACTGCTGATGCTATTGTCAACGCTAATAATGCAGCATTGC 
AG AT G C T G G C T g AAAC T AGT AAAG AAGC GAT T C C GAT GT TAG AG AAG AC C 
GC AC AAAGCCC C ACT GT TTCT ATT AAAT CTGTC ACT GC ATT AG CTGAAAG 
CTTAGTGGCTCAAAATAATGGTATTATCGCTGCCATAGACAAAGGACGTA 
AGGAaCGTGCCCAATTGGAATCTGCTGTT ATT AAAT CGGCT GAAAC AAT C 
AAT GAT T C T GT C AAAAT T C GT GAT a AAAAAAT AGT T G AAG C C T T AC T C AA 
C G AAG GT AAAT C T AC C C AAG AAAAAGT T GAT G AGT C T 

SEQ ID NO. 5209 
STRAIN 1169NT 

G C AGAC AAT G C TAT C ACT AAAAC AG AT AAAAC AAC AG AAAT T ATT T C C AA 
CCAGACAACAAGCCAAACTGGGCAAATTGCCTTTTTTGAAAAACTAACAC 
CAGCACAAAAGTCTGCTATCTCTGAAAAAACACCAGCTTTGGTAGATACT 
TTTGTCGGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGA 
AGGCGTT AAT AC C AC TGTT AAT CAT AT CTTGTCTG AGC AG AAAAAAAT TC 
AAAT T C C T C AAG T T GAT GAT T T AC T AAAAAAT G C T AAT C G C GAAC T AAAT 
G GAT T T AT T G C C AAAT AT AAAG AT G C T AC T C C G G C AG AAT T AG AG AAAAA 
AC C AAAC T T GAT C C AAAAAT T AT T C AAAC AAAG C AAG AC C T C AC T AC AG G 
AAT T T T AT T T T GAC T C AC AAAAC AT C GAG C AAAAAAT G GAT AT GAT G G C A 
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G C AAAT GT T G T C AAAC AAGAAG AT AC T T T G GC AAG AAAT AT CGTCTCTGC 
T GAAAT G CT CAT T G AAG AT AAT AC T AAAT CT AT T GAAAAT T T GGT T GG AG 
TTATTGCTTTTATTGAATCGAGTCAAGCCGAGGCTGCCAATCGTGCAAGC 
C AC T TAG AAC AAG AAAT T C T AGC AT T AGAT AG C C AAAC GT C CG AGT AT C A 
AAT T AAAAGT AAC C AAT TAG C T C GAAT GACT GAAGT T AT C AAT AC C C T C G 
Aa C AG C AACAT ACT G AAT AT GT C AG C C GT C T CT AC GT T G C AT G G G C AAC A 
a CAC C AC AG AT G C G AAACT T GGT C AAAGT AT C GT C AG AT AT G CGT C AAAA 
ACTTGGCATGTTACGTCGAAATACCATTCCAACAATGAAACTCTCAATCG 
C T C AGT T AGGC AT GAT G C AAC AAT C T G T C AAAT C C G G T G T C ACT GC T GAT 
GCTATTGTCAACGCTAATAATGCAGCATTGCAGATGCTGGCTGAAACTAG 
T AAAGAAG CG AT T C C G AT GT T AG AG AAGAC C G C AC AAAG C C C CAC T G T T T 
CT AT T AAAT C T G T CAC T G CAT T AG CT G AAAG C T T AGT GG C T C AAAAT AAT 
GGTATTATCGCTGCCATAGACAAAGGACGTAAGGAACGTGCCCAATTAGA 
ATCTGCTGTTATTAAATCGGCTGAAACAATCAATGATTCTGTCAAAATTC 
G T GAT AAAAAAAT AG T T GAAGC CT T ACT C AAC G AAGGT a AAT CT AC C C AA 
GAAAAAG T T GAT GAGT C T 

SEQ ID NO. 5210 
STRAIN JM9130013 

AG C GAT AC C T T T AAT T TT GAT AT T G AC C AAAT T G C AG AC 

AAT G C TAT C ACT AAAAC AGAT AAAAC AAC AGAAAT TAT T T C C AAC C AGAC 

AACAAGCCAAACTGGGCAAATTGCCTTTTTTGAAAAACTAACACCAGCAC 

AAAAG T C T G CT AT C T CT GAAAAAAC AC C AG CT T T GGT AGAT AC T T T T GT C 

GGTGACCAAAATGCGCTCCTTGATTTTGGACAATCCGCAGTAGAAGGCGT 
TAATACCACTGTTAATCATATCTTGTCTGAGCAGAAAAAAATTCAAATTC 
C T C AAGT T GAT GAT T T ACT AAAAAAT G C T AAT C G C G AAC T AAAT GG AT T T 
AT T GC C AAAT AT AAAG AT G CT AC T C C G G C AGAAT T AG AG AAAAAAC C AAA 
CTTGATTCAAAAATTATTCAAACAAAGCAAGACCTCGCTACAGGAATTTT 
AT T T T GACT CAC AAAAC AT CGAG CAAAAAAT G GAT AT GAT GG C AG C GAAT 
GT T GT C AAAC AAGAAGAT AC T T T G G C AAGAAAT AT C GT C T C T GC TGAAAT 
G C T CAT T GAAGAT AAT ACT AAAT CT AT T GAAAAT T T GGT T G GAGT TAT T G 
CTTTTATTGAATcGAGTCAAGCCGAGGCTGCCAATCGTGCAAGCCACTTA 
C AAC AAGAAATTCT AGC ATT AGAT AGCCAAACGTCCGAGT AT C AAAT tAA 
AAG T a AC C AAT TAG C T C GAAT G AC T GAAGT TAT C AAT AC C C T C G AAC AG C 
AAC AT AC T G AAT AT GT C AG C CGT C T C T AC GT T G CAT G G G C AAC AAC AC C A 
C AG AT G C G AAAC T T GGT C AAAG TAT CGT C AG AT AT GCGT C AAAAAC T T GG 
C AT GT T AC G T CGAAAT AC CAT T C C AAC AAT GAAAC T C T C AAT C GC T C AGT 
T AGGC ATGATGCAAC AAT CTGTC AAAT CCGGTGTCACTGCTGATGCT ATT 
G T C AACG CT AAT AAT G C AG CAT T G C AG AT GC T G G C T G AAACT AGT AAAGA 
AGCGAT T C C G AT GT T AGAGAAGAC C G CAC AAAG C C C CAC T G T T T C T AT T A 
AATCTGTCACTGCATTAGCTGAAAGCTTAGTGGCTCAAAATAATGGTATT 
AT C G C T G C CAT AGAC AAAG Ga C G T AAGG AAC GTG C C C AAT T AGAAT C T G C 
T GT TAT T AAAT CG G C T GAAAC AAT C AAT G AT T CT GT C AAAAT T C GTG AT A 
AAAAAAT AGT T GAAG C C T TACT C AAC GAAG GT a AAT C T AC C C AAG AAAAA 
GT T GAT GAGT C T 

SEQ ID NO. 5211 
STRAIN 2603 

agcgatacctttaattttgatattgaccaaattgcagacaatgctatcac 
taaaacagataaaacaacagaaattatttccaaccagacaacaagccaaa 
ctgggcaaattgccttttttgaaaaactaacaccagcacaaaagtctgct 
atctctgaaaaaacaccagctttggtagatacttttgtcggcgatcaaaa 
tgcgctccttgattttggacaatccgcagtagaaggcgttaataccactg 
ttaatcatatcttgtctgagcagaaaaaaattcaaattcctcaagttgat 
gatttactaaaaaatgctaatcgcgaactaaatggatttattgccaaata 
taaagatgctactccggcagaattagagaaaaaaccaaacttgattcaaa 
aattattcaaacaaagcaagacctcgctacaggaattttattttgactca 
caaaacatcgagcaaaaaatggatatgatggcagcgaatgttgtcaaaca 
agaagatactttggcaagaaatatcgtctctgctgaaatgctcattgaag 
ataatactaaatctattgaaaatttggttggagttattgcttttattgaa 
tcgagtcaagccgaggctgctaatcgtgcaagccacttacaacaagaaat 
tctagcattagatagccaaacgtccgagtatcaaattaaaagtaaccaat 
tagctcgaatgactgaagttatcaataccctcgaacagcaacatcctgaa 
tatgtcagccgtctctacgttgcatgggcaacaacaccacagatgcgaaa 
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cttggtcaaagtatcgtcagatatgcgtcagaaacttggcatgttacgtc 
gaaataccattccaacaatgaaactctcaatcgctcagttaggcatgatg 
caacaatctgtcaaatccggtgtcactgctgatgctattgtcaacgctaa 
taatgcagcattgcagatgctggctgaaactagtaaagaagcgattccga 
tgttagagaagaccgcacaaagccccactgtttctattaaatctgtcact 
gcattagctgaaagcttagtggctcaaaataatggtattatcgctgccat 
agacaaaggacgtaaggaacgtgcccaattggaatctgctgttattaaat 
cggctgaaacaatcaatgattctgtcaaaattcgtgataaaaaaatagtt 
gaagccttactcaacgaaggtaaatctacccaagaaaaagttgatgagtc 
t 

SEQ ID NO. 5212 

STRAIN _0 90 frame: 1 

SDTFNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
T P AE LEKKPN L I QKL FKQ S KT S LQE FY FD S QN I EQKMDMMAAN WKQE DT LARN I VS AEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INTLEQQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMM 
QQS VKSGVTADAIVNANNAALQMLAETS KEAI PMLEKTAQS PT VS IKS VTALAE SLVAQN 
NGIIAAIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 52013 

STRAIN A909 frame: 1 

SDTFNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
T PAE LEKKPNL I QKLFKQS KT S LQE FYFDS QN IE QKMDMMAAN WKQE DTLARN I VS AEM 
LIE DNT K S I EN L VG VX AFI ESS Q AE AANRAS H L Q QE I L AL D S Q T S E YQ IKS NQL ARMT E V 
INTLEQQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMM 
QQS VKSGVTADAIVNANNAALQMLAETSKEAIPMLEKTAQSPTVS IKS VTALAE SLVAQN 
NGIIAAIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5214 

STRAIN H3 6B frame: 1 

SDTFNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
T P AE LEKKPNL I QKL FKQS KT S LQE F Y FD SQN I EQKMDMMAAN WKQE DTLARN I VS AEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INTLEQQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMM 
QQ S VK S G VT AD A I VN ANN AAL QML AE T S KEAI PM L E KT AQ S P T V S I K S VT AL S E S L VAQN 
NGIIAAI DKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5215 

STRAIN 18RS21 frame: 2 

FD I DQ I ADNAI TKT DKTTE 1 1 SNQTT SQTGQI AFFEKLT PAQKS AI SEKT PALVDT FVGD 
QNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAEL 
EKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANVVKQEDTLARNIVSAEMLIEDN 
TKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLE 
QQHPEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMMQQSVK 
SGVTADAIVNANNAALQMLAETSKEAI PMLEKTAQS PTVSIKSVTALAESLVAQNNGI I A 
AIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5216 

STRAIN M7 32 frame: 1 

S DT FNFD I DQI ADNAITKT DKTTE 1 1 SNQTTSQTGQI AFFEKLT PAQKSAI SEKT PALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
TPAELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INTLEQQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMM 
QQS VKSGVTADAIVNANNAALQMLAETS KEAI PMLEKTAQS PTVS IKS VTALAE SLVAQN 
NGIIAAI DKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEK 

SEQ ID NO. 5217 

STRAIN COH1 frame: 3 

KTDKTTEIISNQTTCQTGQIAFFEKLTPAQKSAXSEKTPALVDTFVGDQNALLDFGQSAV 
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EGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAELEKKPNLIQKLFK 
QSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEMLIEDNTKSIENLVGVIA 
FIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLEQQHTEYVSRLYV 
AWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMMQQSVKSGVTADAIVNAN 
NAALQMLAET SKEAI PMLE KTAQS PT VS IKS VTALAE S LVAQNNG I IAAI DKGRKERAQL 
ESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5218 

STRAIN COH1 frame: 3 

KTDKTTEIISNQTTCQTGQIAFFEKLTPAQKSAXSEKTPALVDTFVGDQNALLDFGQSAV 
EGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAELEKKPNLIQKLFK 
QSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEMLIEDNTKSIENLVGVIA 
FIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLEQQHTEYVSRLYV 
A WAT T P QMRN L VKV S S DMR QKLGMLRRN TIPTMKLS I AQL GMMQ Q S VK S G VT AD AI VN AN 
NAALQML AET S KE AI PMLE KT AQS PTVS I KSVTALAES LVAQNNG I IAAI DKGRKERAQL 
ESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5219 

STRAIN M7 81 frame: 2 

FDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVDTFVGD 
QNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAEL 
EKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEMLIEDN 
TKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLE 
QQHTEYVSRLYVAWATTPQMRNLVKVSSDMRQKLGMLRRNTIPTMKLSIAQLGMMQQSVK 
SGVTADAIVNANNAALQMLAETSKEAIPMLEKTAQSPTVSIKSVTALAESLVAQNNGIIA 
AIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5220 

STRAIN CJB110 frame: 2 

FDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKT PALVDTFVGD 
QNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAEL 
EKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANVVKQEDTLARNIVSAEMLIEDN 
TKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLE 
QQHTE YVSRLYVAWATT PQMRNLVKVS S DMRQKLGMLRRNT I PTMKL S I AQLGMMQQS VK 
S G VT AD AI VN ANN AAL QM L AE T S KE A I PM L E KT AQ S PT V S I K S VTALAE S LVAQNNG 1 1 A 
AIDKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5221 

STRAIN 1169NT frame: 1 

ADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVDTFVGDQNALLD 
FGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDATPAELEKKPNL 
I QKL FKQ S KT S L QE F Y FD S QN I E QKM DMMAAN WKQE D T L ARN I V S AEML IEDNTKSIEN 
LVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEVINTLEQQHTEY 
VS RLYVAWAT T P QMRN L VKV S S DMRQKLGMLRRNT I PTMKL S I AQLGMMQQ S VKS GVT AD 
AI VN ANN AALQML AE T S KE AI PMLEKTAQS PTVS IKS VTALAE S LVAQNNG I IAAI DKGR 
KERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5222 

STRAIN JM9130013 frame: 1 

SDTFNFDIDQIADNAITKTDKTTEIISNQTTSQTGQIAFFEKLTPAQKSAISEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
TPAELEKKPNLIQKLFKQSKTSLQE FYFDSQNIEQKMDMMAANVVKQEDTLARNIVSAEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INT LEQQHTE YVSRLYVAWATT P QMRN L VKV S S DMRQKLGMLRRNT I PTMKLS I AQLGMM 
QQS VKS GVT ADAIVNANNAALQMLAET SKEAI PMLEKTAQS PTVS I KSVTALAESLVAQN 
NGIIAAI DKGRKERAQLESAVIKSAETINDSVKIRDKKIVEALLNEGKSTQEKVDES 

SEQ ID NO. 5223 

STRAIN 2 603 frame: 1 

S DT FNFD I DQIADNAITKT DKTTE 1 1 SNQTTSQTGQIAFFEKLT PAQKS AI SEKTPALVD 
TFVGDQNALLDFGQSAVEGVNTTVNHILSEQKKIQIPQVDDLLKNANRELNGFIAKYKDA 
TPAELEKKPNLIQKLFKQSKTSLQEFYFDSQNIEQKMDMMAANWKQEDTLARNIVSAEM 
LIEDNTKSIENLVGVIAFIESSQAEAANRASHLQQEILALDSQTSEYQIKSNQLARMTEV 
INTLEQQHPEYVSRLYVAWATT PQMRNLVKVS S DMRQKLGMLRRNTI PTMKLS IAQLGMM 
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QQS VKSGVTADAIVNANNAALQMLAETSKEAI PMLEKTAQSPTVSIKSVTALAESLVAQN 
NG 1 1 AAI DKGRKE RAQLE S AV I K S AE T I N D S VK I R DKK I VEAL LNE GKS T QEKV D E S 

SEQ ID NO. 5301 
STRAIN 2603 

acaaatactttgaaaaaagaattagttgaagctaaaaagacaattccatc 

cgtaaaagcttcaaaagtaccgcaaaaatcaacatcatcgaaagataaag 

agtttgttcttaaaccgattatcgatgtctctggttggcaacttcctaag 

gagattgattacgatacgctttcaaaaaatatttcaggtgttgttattcg 

tgtctttggtggatcaaagatatctaagactaataacgctgcttatacaa 

ctggaatcgataaatcgtttaagacccatatcaaagaatttcaaaagcga 

aatatcccagtagctgtctacagttatgcacttggttcaagtgttaaaga 

aatgaaagaagaggctcagatattttataagaatgcagctccttacaaac 

caactttttattggattgacgtagaagaggagacaatgtctaacatgaat 

aaaggtgtccaagcattccgaaaagaattaaaaagacttggtgctaaaaa 

tgttggtatctacattggtacttactttatgactgagcaaggcatctctg 

taaaaggatttgacgctgtttggattccaacttatggtagcgattctgga 

tactatgaagcggctccgcaaactgaacttaaatacgatttacaccaata 

cacctctcaaggttatctaccaggawtcaatcaaccgcttgatttaaatc 

aaattgcagttaataaagacaagaagaaaacttatgagaaactttttgga 
aaagtaaaagag 

SEQ ID NO. 5302 
STRAIN 090 

ACAAATACTTTGAAAAAAGAATTAG 

T T GAAG C T AAAAAG AC AAT T C CAT C C GT AAAAG C T T C AAAAGT AC CG C AA 
AAAT C AACAT CAT C GAAAGAT AAAG AGT T T GT T CT T AAAC C GAT TAT C GA 
TGTCTCTGGTTGGCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAA 
AAAATATTTCAGGTGTTGTTATTCGTGTCTTTGGTGGATCAAAGATATCT 
AAGAC T AAT AACG C T G C T T AT AC AAC T G G AAT C GAT AAAT CG T T T AAG AC 
C CAT AT C AAAG AAT T T C AAAAG CG AAAT AT C C C AG T AG CT GT C T ACAGT T 
AT G C AC T T G G T T C AAG T G T T AAAG AAAT G AAAG AAG AG G C T C AG AT AT T T 
T AT AAGAAT G C AG CT C C T T AC AAAC C AAC T T T T TAT T GG AT T GAC GT AG A 
AG AGG AG AC AAT GT CT AAC AT G AAT AAAGGT GT C C AAG CAT T C C GAAAAG 
AATTAAAAAGACTTGGTGCTAAAAATGTTGGTATCTACATTGGTACTTAC 
TTTATGACTGAGCAAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGAT 
TCCAACTTATGGTAGCGATTCTGGATACTATGAAGCGGCTCCGCAAACTG 
AAC T T AAAT AC GAT T T AC AC C AAT AC AC C T C T C AAGGT T AT C T AC C AGG A 
T T C AAT C AAC C G C T T GAT T T AAAT C AAAT T G C AGT T AAT AAAG AC AAGAA 
GAAAACTTATGAGAAACTTTTTGGAAAAGTAAAAGAG 

SEQ ID NO. 5303 
STRAIN A909 

AC AAAT ACT T T G AAAAAAGAAT T AGT T G AAG CT AAAA 

AG AC AAT T C CAT C C G T AAAAG C T T C AAAAGT AC C G C AAAAAT C AAC AT C A 

T CGAAAGAT AAAG AGT T T GT T C T T AAAC C GAT TAT C G AT GT CTCTGGTTG 

GCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAAAAAATATTTCAG 

GTGTTGTTATTCGTGTCTTTGGTGGATCAAAGATATCTAAGACTAATAAC 

G C T G C T TAT AC AAC T GG AAT C GAT AAAT C GT T T AAG AC C CAT AT C AAAG A 

ATTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTTATGCACTTGGTT 
C AAGT GT T AAAG AAAT G AAAG AAG AG G C T C AG AT AT T T TAT AAG AAT G C A 
G C T C C T T AC AAAC C AAC T T T T T AT T G GAT T GACGT AG AAG AG GAGAC AAT 
G T C T AAC AT G AAT AAAG G T G T C C AAG CAT T C CG AAAAG AAT T AAAAAG AC 
TTGGTGCTAAAAATGTTGGTATCTACATTGGTACTTACTTTATGACTGAG 
CAAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATGG 
TAG C GAT T C T G GAT ACT AT G AAG CGG C T C C G C AAAC T G AACT T AAAT AC G 
AT T TAG AC C AAT AC AC C T C T C AAGG T TAT CT AC C AG GAT T C AAT C AAC C G 
C T T GAT T T AAAT C AAAT T G C AGT T AAT AAAGAC AAG AAG AAAAC T TAT G A 
G AAAC T T T T T G G AAAAGT AAAAG AG 

SEQ ID NO. 5304 
STRAIN H36B 

ACAAATACTTTGAAAAAAGAATTAG 

T T GAAG C T AAAAAG AC AAT T C CAT C C G T AAAAG C T T C AAAAGT AC C GCAA 
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AAAT C AAC AT CAT C GAAAGAT AAAG AGT T T G T T C T T AAAC C GAT TAT C G A 
TGTCTCTGGTTGGCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAA 
AAAATATTTCAGGTGTTGTTATTCGTGTCTTTGGTGGATCAAAGATATCT 
AAG AC T AAT AAC G C T G C T T AT AC AAC T GGAAT C G AT AAAT C G T T T AAGAC 
CCAT AT CAAAGAATT TCAAAAG CGAAAT AT CCCAGT AGCTGT CTACAGTT 
AT G C ACT T G G T T C AAGT GT T AAAG AAAT G AAAGAAGAGGCT C AGAT AT T T 
TAT AAG AAT G C AG C T C CT T AC AAAC C AAC T T T T TAT T GG AT T G AC GT AGA 
AGAG GAG AC AAT GT C T AAC AT G AAT AAAGGT G T C C AAG CAT T C CG AAAAG 
AAT TAAAAAGACTT GGT GCT AAAAAT GT TGGT AT CT ACATT GGTACTT AC 
TTTATGACTGAGCAAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGAT 
TCCAACTTATGGTAGCGATTCTGGATACTATGAAGCGGCTCCGCAAACTG 
AACT T AAAT AC GAT T T AC AC C AAT AC AC CT C T C AAG GT T AT CT AC C AG GA 
TT C AAT CAACCGCT T GATTT AAAT C AAATT GCAGT T AAT AAAGACAAGAA 
G AAAAC T TAT GAG AAAC T T T T T G GAAAAGT AAAAG AG 

SEQ ID NO. 5305 
STRAIN 18RS21 

AC AAAT AC T T T G AAAAAAGAAT T AGT T G AAG CT AAAAA 
G AC AAT T C C AT C CG T AAAAGC T T C AAAAGT AC C G C AAAAAT C AAC AT CAT 
CGAAAGATAAAGAGTTTGTTCTTAAACCGATTATCGATGTCTCTGGTTGG 
C AAC T T C CT AAGG AG AT T GAT T AC G AT ACG CT T T CAAAAAAT AT T T C AGG 
T GT T G T TAT TCGTGTCTTTGGT GG AT C AAAG AT AT C T AAGAC T AAT AAC G 
C T G CT T AT AC AAC T GGAAT C GAT AAAT C GT T T AAG AC C CAT AT C AAAG AA 
TTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTTATGCACTTGGTTC 
AAG T GT T AAAG AAAT GAAAGAAG AG GCT C AG AT AT T T TAT AAG AAT G C AG 
C T C C T T AC AAAC C AAC T T T T TAT T GG AT T G AC G T AG AAGAG GAG AC AAT G 
T C T AAC AT GAAT AAAGGT GT C C AAG CAT T C C G AAAAG AAT TAAAAAG AC T 
TGGTGCTAAAAATGTTGGTATCTACATTGGTACTTACTTTATGACTGAGC 
AAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATGGT 
AG C GAT T CT GGAT AC T AT G AAG C GG C T C CG C AAAC T G AAC T T AAAT AC GA 
T T TAG AC C AAT AC AC CT C T C AAGGT T AT C T AC C AG GAT T C AAT C AAC CG C 
TT G AT T T AAAT C AAATT GCAGT T AAT AAAG AC AAG AAG AAAACT TAT GAG 
AAAC T T T T T G GAAAAGT AAAAG AG 

SEQ ID NO. 5306 
STRAIN M732 

AC AAAT ACT T T G AAAAAAGAAT T AGT T G AAG C T AAA 

AAG AC AAT T C CAT C C GT AAAAG C T T C AAAAGT AC C G C AAAAAT C AAC AT C 
ATCGAAAGATAAAGAGTTTGTTCTTAAACCGATTATCGATGTCTCTGGTT 
G GC AAC T T C C T AAG GAG AT T GAT T ACG AT AC G C T T T C AAAAAAT AT T T C A 
GGTGTTGTTATTCGTATCTTTGGTGGATCAAAGATATCTAAGACTAATAA 
C G CT GC T T AT AC AAC T G GAAT C GAT AAAT C GT T T AAG AC C CAT AT C AAAG 
AATTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTTATGCACTTGGT 
T C AAG T GT T AAAG AAAT GAAAGAAG AG GCT C AGAT AT T T TAT AAG AAT G C 
AG CT C CT T AC AAa C C AACT TT T TAT T G GAT T G ACGT AG AAG AGG AGAC AA 
T GT CT AAC AT G AAT AAAGGT GTCC AAG C AT TCCG AAAAG AGT TAAAAAGA 
CTTGGTGCTAAAAATGTTGGTATCTACATCGGTACTTACTTTATGACTGA 
GCAAGGTATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATG 
GTAGCGATTCTGGATACTATGAAGCAGCTCCACAAACTGAACTTAAATAC 
GAT T T AC AC C AAT AC AC C T C T C AAG GT T AT C T AC C AG GAT T C AAT C AAC C 
GCT T GAT T T AAAT C AAAT T G C AGT T AAT AAAG AC AAG AAG AAAACT TAT G 
AGAAAC T T T T T G GAAAAGT AAAAG AG 

SEQ ID NO. 5307 
STRAIN COH1 

ACAAATACTTTGAAAAAAGAATTAGTTGAAGCTAAAA 

AG AC AAT T C CAT C C G T AAAAG C T T C AAAAG T AC C G C AAAAAT C AAC AT C A 

TCGAAAGATAAAGAGTTTGTTCTTAAACCGATTATCGATGTCTCTGGTTG 

GC AAC T T C C T AAG GAG AT T GAT T ACG AT AC G CT T T CAAAAAAT AT T T C AG 

GTGTTGTTATTCGTATCTTTGGTGGATCAAAGATATCTAAGACTAATAAC 

G C T G C T TAT AC AAC T GGAAT C GAT AAAT CG T T T AAG AC C CAT AT C AAAG A 

ATTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTTATGCACTTGGTT 

C AAG T GT T AAAG AAAT GAAAGAAG AG GCT C AG AT AT T T TAT AAG AAT G C A 

G C T C C T T AC AAAC C AAC T T T T TAT T G GAT T GACGT AG AAG AGG AG AC AAT 
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GTCTAACATGAATAAAGGTGTCCAAGCATTCCGAAAAGAGTTAAAAAGAC 
TTGGTGCTAAAAATGTTGGTATCTACATCGGTACTTACTTTATGACTGAG 
CAAGGTATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATGG 
T AG CG AT T CT GGAT AC T AT G AAG C AG CT C C AC AAAC T GAAC T T AAAT AC G 
AT T T AC ACC AAT AC AC C T C T C AAGGT T AT CT AC C AG GAT T C AAT C AAC CG 
C T T GAT T T AAAT C AAAT T G C AG T T AAT AAAG AC AAG AAG AAAAC T TAT G A 
G AAAC T T T T T G G AAAAG T AAAAG AG 

SEQ ID NO. 5308 
STRAIN M781 

AC AAAT AC T T T G AAAAAAG AAT T AG T T G AAGC T AAA 

AAG AC AAT T C CAT C c GT AAAAG C T T C AAAAGT AC C G C AAAAAT C AAC AT C 
AT CGAAAGAT AAAG AGT T T GT T C T T AAAC C GAT T AT CG AT GT C T C T GGT T 
GGCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAAAAAATATTTCA 
GGTGTTGTTATTCGTATCTTTGGTGGATCAAAGATATCTAAGACTAATAA 
CGCTGCTT AT ACAACTGGAAT C GAT AAAT cGTTTAAGACCC AT AT CAAAG 
AAT T T C AAAAG CGAAAT AT C C C AGT AG C T GT CT AC AG T T AT G C AC T T GGT 
T C AAG T G T T AAAGAAAT G AAAG AAG AGG C T C AGAT AT T T T AT AAGAAT G C 
AG C T C C T T AC AAAC C AAC TTTTTatTG GAT T G ACGT AGAAGAG GAG a C AA 
T G T CT AACAT G AAT AAAGGT GT C C AAG CAT T C C G AAAAG AGT T AAAAAG A 
CTTGGTGC T AAAAAT GT T GGT AT CT AC AT C G GT AC T T AC T T TAT GACT GA 
GCAAGGTATCTCTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATG 
GT AGCG AT T C T GGAT AC TAT GAAG C AG C T C C AC AAAC T GAAC T T AAAT AC 
GAT T T AC AC C AAT AC AC C T CT C AAG GT TAT CT AC C AGG AT T C AAT C AAC C 
G C T T GAT T T AAAT C AAAT T G C AGT T AAT AAAG AC AAGAAGAAAAC T TAT G 
AGAAACTTTTTGGAAAAGTAAAAGAG 

SEQ ID NO. 5309 
STRAIN CJB110 

AAAT ACT T T GAAAAAAG AAT T AGT T G AAG CT AAAAAGACAAT T C CAT CCG 
T AAAAG C T T C AAAAG T AC C G C AAAAAT C AAC AT CAT C G AAAG AT AAAG AG 
TTTGTTCTTAAACCGATTATCGATGTCTCTGGTTGGCAACTTCCTAAGGA 
GATTGATTACGATACGCTTTCAAAAAATATTTCAGGTGTTGTTATTCGTG 
TCTTTGGT GGAT CAAAG AT AT C T AAG AC T AAT AACG C T G C T TAT AC AAC T 
GGAAT C GAT AAAT CG T T T AAG AC C CAT AT CAAAG AAT T T C AAAAG C G AAA 
TAT C C C AGT AG C T GT CT AC AGT TAT G C AC T T GG T T C AAGTG T T AAAG AAA 
T G AAAG AAGAG G C T C AG AT AT T T TAT AAGAAT G C AG CT C C T T AC AAAC C A 
AC T T T T TAT T GGAT T G AC GT AGAAG AGG AG AC AAT G T C T AAC AT GAAT AA 
AGGT GT C C AAG CAT T C C G AAAAG AAT TAAAAAGAC T T G G T G C T AAAAAT G 
T T GGT AT C T AC AT T GGT AC T T AC T T TAT GACT GAG C AAGG CAT C T C T GT A 
AAAGG AT T T G AC GCTGTTTG GAT T C C AAC T TAT GGT AG CG AT T CT GG AT A 
CT AT G AAGCGG C T C C G C AAAC T G AACT T AAAT AC GAT T T AC AC C AAT AC A 
C CT C T C AAG G T T AT CT AC C AG GAT T C AAT C AACCG C T T GAT T T AAAT C AA 
AT T AC AG T T AAT AAAG AC AAG AAG AAAAC T T AT GAG AAAC T T T T T GG AAA 
AGTAAAAGAG 

SEQ ID NO. 5310 
STRAIN 1169NT 

AC AAAT AC T T T GAAAAAAG AAT T AGT T GAAG CT AAAAAGACAAT T CC 
AT C C GT AAAAG C T T C AAAAGT AC CG C AAAAAT C AAC AT CAT C G AAAG AT A 
AAGAGTTTGTTCTTAAACCGATTATCGATGTCTCTGGTTGGCAACTTCCT 
AAGGAGATTGATTACGATACGCTTTCAAAAAATATTTCAGGTGTTGTTAT 
TCGTGTCTTTGGTGGATCAAAGATATCTAAGACTAATAACGCTGCTTATA 
C AAC T G GAAT C GAT AAAT C GT T T AAG AC C CAT AT CAAAG AAT T T C AAAAG 
C G AAAT AT C C C AGT AG C T GT C T AC AG T TAT G C AC T T G G T T C AAG T GT T AA 
AG AAAT G AAAG AAG AGG C T C AG AT AT T T TAT AAG AAT G C AG CTCCTTACA 
AAC C AAC T T T T T AT T G GAT T G AC GT AG AAG AGGAG AC AAT GT C T AAC AT G 
AAT AAAGGT GT C C AAG CAT T C C G AAAAG AAT T AAAAAG AC T T G G CG C T AA 
AAATGTTGGTATCTACATCGGTACTTACTTTATGACTGAGCAAGGTATCT 
CTGTAAAAGGATTTGACGCTGTTTGGATTCCAACTTATGGTAGCGATTCT 
G GAT AC TAT GAAG C AG C T C C G C AAAC T G AACT T AAAT AC GAT T T AC AC C A 
AT AC AC C T C T C AAGG T TAT C T AC C AG GAT T C AAT C AAC C G CT T GAT T T AA 
AT C AAAT T G C AG T T AAT AAAG AC AAG AAG AAAAC T TAT GAG AAAC T T T T T 
GGAAAAGTAAAAGAG 
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SEQ ID NO. 5311 
STRAIN JM9130013 

ACAAATACTT TGAAAAAAGAATT AG 

T T G AAG C T AAAAAG AC AAT T C CAT C CGT AAAAG C T T C AAAAG T AC C G C AA 
AAAT C AACAT CAT C GAAAGAT AAAGAGT T T GT T CT T AAAC C GAT TAT C GA 
TGTCTCTGGTTGGCAACTTCCTAAGGAGATTGATTACGATACGCTTTCAA 
AAAATATTTCAGGTGTTGTTATTCGTGTCTTTGGTGGATCAAAGATATCT 
AAGACTAATAACGCT GCT T AT AC AACTGGAAT CGAT AAAT CGT T T AAG AC 
CCATATCAAAGAATTTCAAAAGCGAAATATCCCAGTAGCTGTCTACAGTT 
AT G C AC T T G GT T C AAGT GT T AAAG AAAT GAAAGAAGAG GCT C AG AT AT T T 
TAT AAG AAT GC AG C T C C T T AC AAAC C AACT T T T T AT T G GAT T G ACGT AG A 
AG AG G AGAC AAT GT C T AACAT GAAT AAAGGT G T C C AAG CAT T C C G AAAAG 
AAT T AAAAAGAC T T G G T G CT AAAAAT G T T GGT AT C T AC AT T G GT AC T T AC 
TTTATGACTGAGCAAGGCATCTCTGTAAAAGGATTTGACGCTGTTTGGAT 
TCCAACTTATGGTAGCGATTCTGGATACTATGAAGCGGCTCCGCAAACTG 
AAC T T AAAT AC GAT T T AC AC CAAT AC AC C T C T C AAG G T T AT C T AC C AG G A 
T T C AAT C AAC CG CT T GAT T T AAAT C AAAT T G C AGT T AAT AAAGAC AAG AA 
G AAAAC T TAT G AGAAAC T T T T T GGAAAAG T AAAAGAG 

SEQ ID NO. 5312 

STRAIN 2 603 frame: 1 

TNTLKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKEIDYDTLSKN 
I S G W I R V FG G S K I S KTNNAA YT T G I D K S FKT H I KE FQKRN I P VAV Y S YAL G S S VKEMKE 

EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGXNQPLDLNQIAVNKD 
KKKT YE KL FGKVKE 

SEQ ID NO. 5313 

STRAIN 090 frame: 1 

TNTLKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKEIDYDTLSKN 
I SGWIRVFGGSKI SKTNNAAYTTGI DKS FKTHIKE FQKRNI PVAVYS YALGS S VKEMKE 
EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 
GI S VKGFDAVWI PT YGS DSGYYEAAPQTELKYDLHQYTS QGYLPGFNQPLDLNQI AVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5314 

STRAIN A909 frame: 1 

TNTLKKELVEAKKT I PS VKASKVPQKST S SKDKE FVLKPI I DVSGWQL PKE I DYDTLSKN 
I SGWIRVFGGSKISKTNNAAYTTGI DKS FKTHIKE FQKRNI PVAVYSYALGSS VKEMKE 
E AQI FYKNAAPYKPT FYWI DVEEETMSNMNKGVQAFRKE LKRLGAKNVG I Y I GT YFMTEQ 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5315 

STRAIN H3 6B frame: 1 

TNTLKKELVEAKKT I PS VKASKVPQKSTS SKDKE FVLKPI I DVSGWQLPKE I DYDTLSKN 
I SGWIRVFGGSKI SKTNNAAYTTGI DKS FKTHIKE FQKRN I PVAVYS YALGS SVKEMKE 
EAQI FYKNAAPYKPT FYWI DVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGT YFMTEQ 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQI AVNKD 
KKKT YE K L FGKVKE 

SEQ ID NO. 5316 

STRAIN 18RS21 frame: 1 

TNTLKKELVEAKKT I PSVKASKVPQKSTSSKDKEFVLKPI I DVSGWQLPKEI DYDTLSKN 
I S G V V I R V FGG S K I S KT NN AA YT T G I DK S FKTH I KE FQKRN I P VAV Y S YALG S S VKE MKE 

EAQIFYBCNAAPYKPT FYWI DVEEETMSNMNKGVQAFRKE LKRLGAKNVG I YIGT YFMTEQ 

GIS VKGFDAVWI PTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5317 

STRAIN M7 32 frame: 1 

TNTLKKELVEAKKTI PSVKASKVPQKSTSSKDKEFVLKPI I DVSGWQLPKEI DYDTLSKN 
I SGWIRIFGGSKISKTNNAAYTTGI DKS FKTHIKE FQKRNI PVAVYS YALGS SVKEMKE 
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EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 
GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5318 

STRAIN COH1 frame: 1 

TNTLKKELVEAKKTI PS VKASKVPQKSTS SKDKE FVLKPI I DVSGWQLPKE I DYDTLSKN 
I S G W I R I FG G S K I S KTNN AA YT T G I DK S FKTH I KE FQKRN I P VAV Y S YALG S S VKEMKE 
E AQ I F YKN AAP YKP T F Y W I D VE E E T M S NMNKG VQ A FRKE LKRL G AKN VG I Y I GT Y FMTE Q 

GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5319 

STRAIN M781 frame: 1 

TNTLKKELVEAKKTI PSVKASKVPQKSTS SKDKE FVLKPI I DVSGWQLPKEIDYDTLSKN 
ISG WIRI FGGS KI SKTNNAAYTTGI DKS FKTHIKEFQKRNI PVAVYS YALG S S VKEMKE 
EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 
GISVKGFDAVWIPTYGSDSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5320 

STRAIN CJB110 frame: 2 

NTLKKELVEAKKTI PSVKAS KVPQKSTSSKDKE FVLKPI IDVSGWQLPKEIDYDTLSKNI 
SGWIRVFGGSKI SKTNNAAYTTGI DKS FKTHIKEFQKRNI PVAVYS YALGSSVKEMKEE 
AQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQG 
I S VKGFDAVWI PTYGS DSGYYE AAPQTELKYDLHQYTSQGYLPGFNQPLDLNQITVNKDK 
KKT YEKL FGKVKE 

SEQ ID NO. 5321 

STRAIN 1169NT frame: 1 

TNTLKKELVEAKKTI PSVKASKVPQKSTS SKDKE FVLKPI IDVSGWQLPKEIDYDTLSKN 
I SGWIRVFGGSKI SKTNNAAYTTGI DKS FKTHI KE FQKRN I PVAVYS YALG S S VKEMKE 
EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 
GISVKGFDAVWI PTYGS DSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5322 

STRAIN JM9130013 frame: 1 

TNT LKKELVEAKKTIPSVKASKVPQKSTSSKDKEFVLKPIIDVSGWQLPKE I DYDTLSKN 
I SGWIRVFGGSKI SKTNNAAYTTGI DKS FKTHIKE FQKRNI PVAVYS YALG S S VKEMKE 
EAQIFYKNAAPYKPTFYWIDVEEETMSNMNKGVQAFRKELKRLGAKNVGIYIGTYFMTEQ 
GISVKGFDAVWI PTYGS DSGYYEAAPQTELKYDLHQYTSQGYLPGFNQPLDLNQIAVNKD 
KKKT YEKL FGKVKE 

SEQ ID NO. 5401 
STRAIN 2603 

TTGACTCACAAAAAT AT ATT AT T AACCATT ATAT TT GGATT ATT T 

AT GAT TAT AT TAT C AGC AT GT G GT AT GT C T AAT AAG G AAAT GG C T GGT AT T G AT AAT T GG 
G AAC AT TAT C AAAAG G AAAAG AAAAT TACT AT T GGAT T T G AT AAT AC TTTTGTTC CT AT G 
GGATTTGAAAGTCGTTCTGGTGACTATACCGGCTTTGATATTGATTTAGCTAATGCTGTT 
T T T AAAG AAT AC G G T AT T T C AG T G AAAT GG C AG C C T AT T AACT G GG AT AT G AAAG AAACT 
G AACT T AAT AAT G GT AAT AT AG AC CTTATTTG G AAT GG T T AT T C AAAAAC GG C AG AAC GT 
GCTAAAAAAGTCGCTTTTACAAACCCATATATGAATAATCATCAAGTAATTGTTACTAAA 
AC T T CAT C AC AT AT T AAT AG T AT T AAG GAT AT G AAG G GG AAAAAAC TAG G AGC C C AGT C G 
GGTTCATCTGGTTTTGATGCTTTTAACGCTAAACCTGATATTTTAAAAAAGTTTGTAAAA 
G G AAAAG AAG C AG T T C AAT AC GAT AC T T T C AC T C AGG C T T T GAT T GAT T T AAAAAAT AAC 
C GT AT T GAT GGT CT T T T GAT T GAT G AAGT T TAT G C T AACT AT TAT T T AAAG C AAG AAG G A 
AATATAAAAGCTTATTATTTTGTTAAAACTGCTTATCAAGGAGAAAATTTTGTAGTAGGA 
G C T C GT AAAGT T GAT C G TAG AC T AAT T G AAAAG ATT AAC AAAG C T T T C AAAC AG CT T CAT 
AAT AAGG G GAG AT T T C AAAAAAT C T C T T AC AAAT G G T T T G GT G AAGAT G T T TAT AG T AAA 
GAA 

SEQ ID NO. 5402 

STRAIN 090 
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AT TGGGa AC AT TAT C 

AAAAGGAAAAG AAAAT T AC TAT T GGAT T T G AT AAT ACT T T T GT T C C T AT G 
G GAT T T G AAAG C C GT T CT G GT G ACT At AC C G G CTT T GAT AT T GAT T TAG C 
TAATGCTGTTTTTAAAGAATACGGTATTTCAGTGAAATGGCAGCCTATTA 
AC T GGG AT AT G AAAG AAAC T GAACT T AAT AAT GGT AAT AT AG AC CT TAT T 
TGGAATGGTTATTCAAAAACGGCAGAACGTGCTAAAAAAGTCGCTTTTAC 
AAAC C CAT AT AT G AAT AAT CAT C AAG T AAT T GT T ACT AAAACT T CAT C AC 
ATATTAATAGTATTAAGGATATGAAGGGGAAAAAACTAGGAGCCCAGTCG 
GGTTCATCTGGTTTTGATGCTTTTAATGCTAAACCTGATATTTTAAAAAA 
GT T T GT AAAAG G AAAAG AAG C AG T T C AAT AC G AT AC T TT C ACT C AGG C T T 
TGATTGATTTAAAAAATAACCGTATTGATGGTCTTTTGATTGATGAAGTT 
TATGCTAACTATTATTTAAAGCAAGAAGGAAATATAAAAGCTTATTATTT 
TGTTAAAACTGCTTATCAAGGAGAAAATTTTGTAGTAGGAGCTCGCAAAG 
T T GAT C GT AGAC T AAT T G AAAAG AT TAAC AAAG CT T T C AAAC AGC T T CAT 
AAT AAG G G AAAAT T T C AAAAAAT C T C T T AC AAAT GGTTTGGT G AAG AT G T 
T TAT AG T AAAGAA 

SEQ ID NO. 5403 

STRAIN A909 
ATTGGG 

a AC AT TAT C AAAAGGAAAAG AAAAT T AC TAT T GG AT T T GAT AAT ACT T T T 
GTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCTTTGATAT 
T GAT T T AG C T AAT GCTGTTTT T AAAG AAT AC G GT AT T T C AGT G AAAT G GC 
AGCCTATTAACTGGGATAtgAAAGAAACTGAACTTAATAATGGTAATATA 
G AC CT T AT T T G G AAT G GT T AT T CAAAAACG GCAGAACGT GC T AAAAAAGT 
CG CT T T T AC AAAC C CAT AT AT GAAT AAT CAT C AAGT AAT T GTT ACT AAAA 
CTT CAT C AC AT AT T AAT AG T AT T AAG GAT AT G AAGGGG AAAAAAC T AG GA 
GCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAACCTGATAT 
T T T AAAAAAGT T T GT AAAAGG AAAAGAAGC AG t T C AAT AC GAT ACT T T C A 
CTCAGGCTTTGATTGATTTAAAAAATAACCGTATTGATGGTCTTTTGATT 
GAT GAAGT T T AT G C TAAC TAT TAT T T AAAGC AAGAAG G AAAT AT AAAAG C 
TT AT T ATTTT GTT AAAACT GCTT AT CAAGG AG AAAAT TTTGT AGT AGG AG 
C T CG T AAAGT T GAT C GT AGAC T AAT T GAAAAG AT TAAC AAAG C T T T C AAA 
C AG CTT CAT AAT AAG GGG AGAT T T CAAAAAAT C T CTT AC AAAT G GT T T GG 
T G AAG AT GTT TAT AG T AAAG a A 

SEQ ID NO. 5404 

STRAIN H3 6B 

ATTGGGAACATTATCAAAAGGAAAAGAAAATTACTATTGGATT 
TGATAATACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATA 
CCGGCTTTGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATT 
T C AG T GAAAT G G C AG C C T AT TAAC T G GGAT AT G AAAG AAACT GAAC T T AA 
TAATGGTAATATAGACCTTATTTGGAATGGTTATTCAAAAACGGCAGAAC 
GTGCT AAAAAAGT CGCT TTTACAAAC CCAT AT ATGAAT AAT CAT CAAGT A 
AT T GT T AC T AAAAC T T CAT C AC AT AT T AAT AGT AT T AAGGAT AT G AAG G G 
GAAAAAACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACG 
CTAAACCT GAT AT TTTAAAAAAGT T T GT AAAAGG AAAAGAAGC AG t T CAA 
TACGATACTTTC AC TCAGGCTTTGATTGATTT AAAAAAT AACCGTATTGA 
TGGT CTT TTG AT TG AT G AAGT t T AT GCTAACT ATT AT TT AAAGC AAGAAG 
GAAAT AT AAAAGCTT ATT ATTTT GTT AAAACTGCTT AT CAAGGAg AAAAT 
TTTGTAGTAGGAGCTCGTAAAGTTGATCGTAGACTAATTGAAAAGATTAA 
C AAAG CT T T C AAAC AG CT T C AT AAT AAGG G GAG AT T T CAAAAAAT C T C T T 
AC AAAT GGTTTGGT GAAG AT GT T TAT AGT AAAG AA 

SEQ ID NO. 5405 

STRAIN 18RS21 
ATTGGGAACATTA 

T C AAAAGG AAAAG AAAAT T AC TAT T GG AT T T GAT AAT AC TTTTGTTCC T A 
TGGGATTTGAAAGTCGTTCTGGTGACTAtACCGGCTTTGATATTGATTTA 
G C T AAT G C T GT T T T T AAAGAAT AC GGT AT T T C AGT GAAAT GG C AG C CT AT 
T AAC T GG G AT AT G AAAG AAAC T GAACT T AAT AAT GGT AAT AT AGAC CT T A 
TTTGGAATGGTTATTCAAAAACGGCAGAACGTGCTAAAAAAGTCGCTTTT 
AC AAAC C CAT AT AT GAAT AAT CAT CAAGT AAT T G T T AC T AAAAC T T CAT C 
AC AT AT T AAT AGT AT T AAG GAT AT GAAG G G G AAAAAAC T AGG AG C C C AGT 
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CGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAACCTGATATTTTAAAA 
AAGT T T GT AAAAGGAAAAG AAG C AGT T C AAT AC GAT AC T T T C ACT C AGG C 
TTTGATTGATTTAAAAAATAACCGTATTGATGGTCTTTTGATTGATGAAG 
T T TAT G C T AACT AT TAT T T AAAGC AAG AAGGAAAT AT AAAAG CT TAT TAT 
T T T GT T AAAAC T GCT T AT C AAG G AGAAAAT T T T GT AGT AGG AG CT C G T AA 
AGT T GAT C GT AGAC T AAT T G AAAAG AT T AAC AAAG CT T T C AAAC AG C T T C 
AT AAT AAGG G GAG AT T T C AAAAAAT CT C T T AC AAAT G G T TT GGT G AAGAT 
GTTTATAGTAAAGAA 

SEQ ID NO. 5406 

STRAIN M732 

AT T GGGAAC AT T AT C AAAAGG AAAAGAAAAT T AC TAT T GG AT T T G AT AA 
TACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCT 
T T GAT AT T GAT T TAG C T AAT GCTGTTTT T AAAG AAT ACG GT AT T T C AGT G 
AAAT GG C AG C C T AT T AAC T G G GAT AT GAAAG AAAC T GAACT T AAT AAT GG 
T AAT AT AG AC CT TAT T T GG AAT GGT TAT T C AAAAAC G G C AG AAC GT G CT A 
AAAAAGT C G C T T T T AC AAAC C CAT AT AT G AAT AAT CAT C AAGT AAT T GT T 
ACTAAAACTTCATCACATATTAATAGTATTAAGGATATGAAGGGGAAAAA 
ACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAAC 
C T GAT AT T T T AAAAAAGT T T GT AAAAG G AAAAGAAGC AGT T C AAT AC GAT 
AC T T T C ACT C AG G CT T T GAT T GAT T T AAAAAAT AAC C GT AT T GAT GGT C T 
T T T GAT T GAT GAAGT T TAT G CT AAC TAT TAT T T AAAG C AAG AAGG AAAT A 
T AAAAG CT TAT TAT T T T G T T AAAACT G C T T AT C AAGG AGAAAAT T T T GT A 
GT AGGAG C T C GT AAAG T T GAT C G TAG AC T AAT T G AAAAG AT T AAC AAAG C 
T T T C AAAC AG C T T C AT AAT AAGGGG AGAT T T C AAAAAAT CT C T T AC AAAT 
GGTTTGGTGAAGATGTTTATAGTAAAGAA 

SEQ ID NO. 5407 

STRAIN COH1 

AT T G G GAAC AT T AT C AAAAG GAAAAG AAAAT T AC T AT T G GAT T T G AT AA 
TACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCT 
TTGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATTTCAGTG 
AAATGGCAGCCTATTAACTGGGATATGAAAGAAACTGAACTTAATAATGG 
T AAT AT AG AC C T TAT T T GG AAT GGT TAT T C AAAAAC G G C AGAAC G T G C T A 
AAAAAGT CGCT T TTACAAACCCAT AT ATGAAT AAT CAT CAAGTAATTGTT 
AC T AAAAC T T CAT C AC AT AT T AAT AGT AT T AAG GAT AT G AAG GG GAAAAA 
ACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAAC 
CT GAT AT T T T AAAAAAGT T T G T AAAAG GAAAAG AAGC AGT T C AAT AC GAT 
ACTT T C AC TCAGGCTTTG ATT G AT TT AAAAAAT AACCGT AT TGATGGTCT 
TTTGATTGATGAAGTTTATGCTAACTATTATTTAAAGCAAGAAGGAAATA 
T AAAAG CT TAT TAT T T T GT T AAAACT GCT TAT C AAGG AGAAAAT T TT GT A 
G T AG GAG C T C GT AAAG T T GAT C G TAG ACT AAT T GAAAAG AT T AAC AAAG C 
T T T C AAAC AG C T T CAT AAT AAG G GGAGAT T T C AAAAAAT C T CT T AC AAAT 
GGTTTGGTGAAGATGTTTATAGTAAAGAA 

SEQ ID NO. 5408 

STRAIN M7 81 

AT T GGGAAC AT TAT C AAAAG GAAAAG AAAAT TACT AT T G GAT T T GAT A 
ATACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGC 
TTTGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATTTCAGT 
G AAAT G G C AG C C T AT T AACT G GG AT AT GAAAGAAAC T GAAC T T AAT AAT G 
GT AAT AT AG AC C T TAT T T G GAAT GGT TAT T C AAAAAC GG C AG AACGT GCT 
AAAAAAGT CGCTTTTACAAAC C CAT AT ATGAAT AAT CAT CAAGT AAT TGT 
T AC T AAAAC T T CAT C AC AT AT T AAT AG TAT T AAG GAT AT G AAG G GG AAAA 
AACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAA 
C C T GAT AT T T T AAAAAAGT T T GT AAAAG G AAAAGAAG C AGT T C AAT ACG A 
T AC TT TC ACT CAGGCTTTG ATT GAT TT AAAAAAT AACCGT ATT GATGGTC 
T T TT GATTGATGAAGT TT AT GCT AACT ATTATTTAAAGC AAGAAGGAAAT 
AT AAAAGCT TAT T AT TTTGTT AAAACT GCTTATCAAGGAGAAAATTT TGT 
AG TAG GAG C T C GT AAAGT T GAT C G TAG ACT AAT T GAAAAG AT T AAC AAAG 
CT TT CAAACAGCTT CAT AAT AAGGGG AGATT TCAAAAAAT CT CT TACAAA 
TGGTTTGGTGAAGATGTTTATAGTAAAGaA 

SEQ ID NO. 5409 
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STRAIN CJB110 

AT T GGG AAC AT TAT C AAAAG GAAAAG AAAAT T AC TAT T GGAT T T G AT AAT 
ACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCTT 
TGATATTGATTTAGCTAATGCTGTTTTTAAAGAATACGGTATTTCAGTGA 
AAT G G GAG C C TAT TAACT GG GAT AT G AAAGAAAC T G AAC T T AAT AAT G GT 
AAT AT AGAC C T TAT T T G G AAT GGT T ATT C AAAAAC GG C AG AAC G T G C T AA 
AAAAGT CG CT T T T AC AAAC C CAT AT AT GAAT AAT CAT CAAG T AAT T GT T A 
CTAAAACT T CAT CACATATT AAT AGTATTAAGGATAT GAAGGGGAAAAAA 
CTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAACGCTAAACC 
T GAT AT T T T AAAAAAG T T T GT AAAAG GAAAAG AAGCAGT T C AAT AC GAT A 
CTTTCACTCAGGCTTTGATTGATTTAAAAAATAACCGTATTGATGGTCTT 
T T GAT T GAT G AAG T T TAT GC T AAC TAT TAT T T AAAG C AAG AAGG AAAT AT 
AAAAGC T TAT TAT T T T GT T AAAACT G C T TAT C AAGGAGAAAAT T T T GT AG 
T AGGAG C T C G T AAAG T T GAT CGT AGACT AAT T G AAAAGAT T AAC AAAG C T 
T T C AAAC AG C T T CAT AAT AAG G GG AGAT T T C AAAAAAT C T C T T AC AAAT G 
GTTTGGTGAAGATGTTTATAGTAAAGAA 

SEQ ID NO. 5410 

STRAIN 1169NT 

AT T GGG AAC AT TAT C AAAAGGAAAAGAAAAT T AC TAT T GGAT T T G AT AA 
TACTTTTGTTCCTATGGGATTTGAAAGTCGTTCTGGTGACTATACCGGCT 
T T GAT AT T GAT T TAG C T AAT GCTGTTTT T AAAGAAT AC GGT AT T T C AG T G 
AAAT G G C AG C C T AT T AAC T G G GAT AT G AAAGAAAC T G AAC T C AAT AAT G G 
T AAT AT AG AC CT T AT T T G GAAT GGT TAT T C AAAAAC GG C AGAAC GT G C T A 
AAAAAGT CGCT TTT ACAAACCC AT AT ATGAAT AAT CAT CAAGT AATT GTT 
AC T AAAAC T T C AT C AC AT AT T AAT AG TAT T AAG GAT AT GAAGG G GAAAAA 
ACTAGGAGCCCAGTCGGGTTCATCTGGTTTTGATGCTTTTAATGCTAAAC 
C T G AC AT T T T AAAAAAGT T T G T AAAAGG AAAAGAAG C AGT T C AAT AC GAT 
ACTTTCACTCAGGCTTTGATTGATTTAAAAAATAACCGTATTGATGGTCT 
TTT GAT T GAT G AAG T T T AT G CT AAC TAT TAT T T AAAG CAAG AAG GAAAT A 
T AAAAG CTT ATT ATT TTGTT AAAACT GCTTATCAAGGAG AAAAT TTTGT A 
GTAGGAGCTCGCAAAGTTGATCGTAGACTAATTGAAAAGATTAACAAAGC 
TTTCAAACAGCTTCATAATAAGGGGAAATTTCAAAAAATCTCTTACAAAT 
G GT T T GGT G AAG AT GTT T AT AGT AAAGAA 

SEQ ID NO. 5411 

STRAIN JM9130013 
AT T GGG AAC AT TAT C 

AAAAGGAAAAGAAAAT TACT ATT GGAT TTGAT AAT ACTTTTGTTCCTATG 
GGATTTGAAAGTCGTTCTGGTGACTAtACCGGCTTTGATATTGATTTAGC 
T AAT GC T G T T T T T AAAGAAT ACG G TAT T T C AG T GAAAT G G C AG C CT AT T A 
ACT GGG AT AT G AAAG AAACT G AAC T T AAT AAT G GT AAT AT AG AC C T T AT T 
TGGAATGGTTATTCAAAAACGGCAGAACGTGCTAAAAAAGTCGCTTTTAC 
AAAC C CAT AT AT GAAT AAT CAT CAAGT AAT T GT TACT AAAAC T T CAT C AC 
AT AT T AAT AG TAT T AAGGAT AT G AAGG GGAAAAAAC T AGGAG C C C AG T C G 
GGTTCATCTGGTTTTGATGCTTTTAACGCTAAACCTGATATTTTAAAAAA 
GTT T GT AAAAGG AAAAGAAG C AGT T C AAT AC GAT AC T T T C AC T C AGG CT T 
T GAT T GAT T T AAAAAAT AAC C GT AT T GAT GGTCTTTT GAT T GAT G AAGT T 
TAT GC TAACT AT TAT T T AAAG CAAGAAGG AAAT AT AAAAG CT T AT TAT T T 
T GT T AAAAC T G C T TAT CAAG G AGAAAAT TTT GT AG TAG GAG C T C GT AAAG 
T T GAT C G TAG ACT AAT T GAAAAG AT T AAC AAAG CT T T C AAAC AG CTT CAT 

AATAAGGGGAGATTTCAAAAAATCTCTTACAAATGGTTTGGTGAAGATGT 
TTAT AGT AAAGAA 

SEQ ID NO. 5412 

STRAIN 2 603 frame: 1 

LTHKNILLTIIFGLFMIILSACGMSNKEMAGIDNWEHYQKEKKITIGFDNTFVPMGFESR 
SGDYTGFDIDLANAVFKEYGISVKWQPINWDMKETELNNGNIDLIWNGYSKTAERAKKVA 
FTNPYMNNHQVIVTKTSSHINSIKDMKGKKLGAQSGSSGFDAFNAKPDILKKFVKGKEAV 
QYDT FTQAL I DLKNNRI DGLL I DEVYAN YYLKQEGN IKAY Y FVKT A YQGEN FVVGARKVD 
RRL I EK I NKAFKQLHNKGRFQK I S YKWFGE DV Y S KE 

SEQ ID NO. 5413 

STRAIN 0 90 frame: 3 
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WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGN I DL I WNG Y S KT AERAKKVAFTN P YMNNHQ VI VTKT S SH IN S I KDMKGKKLGAQ 
SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRIDGLLIDEVYANYYLKQE 
GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGKFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5414 

STRAIN A909 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKBCVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRIDGLLIDEVYANYYLKQE 
GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5415 

STRAIN H3 6B frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGS SGFDAFNAKPDILKKFVKGKEAVQYDT FTQALI DLKNNRI DGLLI DEVYAN YYLKQE 
GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5416 

STRAIN 18RS21 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKPCVAFTNPYMNNHQVIVTKTSSHINSIKDMKGKKLGAQ 
SGS SGFDAFNAKPDILKKFVKGKEAVQYDT FTQALI DLKNNR I DGLLI DEVYAN YYLKQE 
GNIKAYYFVKTAYQGENFVVGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5417 

STRAIN M7 32 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYM1SINHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRI DGLLI DEVYANYYLKQE 
GN I KAY Y FVKT AYQGEN FVVGARKVDRRL I E KINKAFKQ LHNKGRFQKI S YKW FGE D V YS 
KE 

SEQ ID NO. 5418 

STRAIN COH1 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
S GS SGFDAFNAKPD I LKKFVKGKEAVQYDTFTQAL I DLKNNRI DGLLI DEVYANYYLKQE 

GNIKAYYFVKTAYQGENFVVGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5419 

STRAIN M781 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGS SGFDAFNAKPD I LKKFVKGKEAVQYDTFTQAL I DLKNNRI DGLLI DEVYAN YYLKQE 
GN I KAY Y FVKT AYQGEN FVVGARKVDRRL I EK I NKAFKQLHNKGRFQK IS YKW FGE DVYS 
KE 

SEQ ID NO. 5420 

STRAIN CJB110 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 
TELNNGNIDLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTS SHINS IKDMKGKKLGAQ 
SGS SGFDAFNAKPD I LKKFVKGKE AVQ YDT FTQAL I DLKNNRI DGLL I DEVYANYYLKQE 
GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5421 

STRAIN 1169NT frame: 3 
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WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 

TELNNGNI DLIWNGYSKTAERAKPCVAFTNPYMNNHQ VI VTKTS SHINS IKDMKGKKLGAQ 

SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRIDGLLIDEVYANYYLKQE 

GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGKFQKISYPCWFGEDVYS 
KE 

SEQ ID NO. 5422 

STRAIN OM9130013 frame: 3 

WEHYQKEKKITIGFDNTFVPMGFESRSGDYTGFDIDLANAVFKEYGISVKWQPINWDMKE 

TELNNGNI DLIWNGYSKTAERAKKVAFTNPYMNNHQVIVTKTSSHINS IKDMKGKKLGAQ 

SGSSGFDAFNAKPDILKKFVKGKEAVQYDTFTQALIDLKNNRIDGLLIDEVYANYYLKQE 

GNIKAYYFVKTAYQGENFWGARKVDRRLIEKINKAFKQLHNKGRFQKISYKWFGEDVYS 
KE 

SEQ ID NO. 5501 
STRAIN 2603 

ATGCTTAAATCTTTTTTGATTTTCTTAGTTCGCTTTTACCAAAAAAATATTTCTCCAGCT 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGAAGCTATTCAA 

AAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTATTTTGCGATGTCATCCCTTA 

GCCCACGGAGGAAATGATCCTGTCCCTGATCATTTTAGCTTAAGACGTAATAAAACGGAT 
ATATCAGAT 

SEQ ID NO. 5502 

STRAIN 090 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CATTTTAGCTT 

SEQ ID NO. 5503 

STRAIN A90 9 

TTCCCAGCTAGCTGTCGTTATCGTCCAACtTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T Ag C T T AAG ACGT AAT AAAAC G GAT AT A 

SEQ ID NO. 5504 

STRAIN H36B 

TTCCCAGCTAGCTGTCGTTATCGTCCaACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTTCTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T AGC T T AAGAC GT AAT AAAAC GGAT AT AT C AG AT 

SEQ ID NO. 5505 

STRAIN 18RS21 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T AGC T T AAG ACG T AAT AAAAC G GAT AT AT C AGAT 

SEQ ID NO. 5506 

STRAIN M732 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AG C T AT T C AAAAAC AT GGT C T AAAAG G T GT GT T GAT G G G GAT T G C AC G T A 
TTTTGCGATGTCATCCCTTAgCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T TAG C T T AAG AC G T AAT AAAAC G GAT AT AT C AG AT 

SEQ ID NO. 5507 

STRAIN COH1 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGAAGCTATTCAA 
AAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTATTTTGCGATGTCATCCCTTA 
GCCCACGGAGGAAATGAtCCTGtCCCTGATCATTTTAGCT 

SEQ ID NO. 5508 
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STRAIN M7 81 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T AG CT T AAG AC GT AAT AAAAC GGAT AT AT C AG AT 

SEQ ID NO. 5509 

STRAIN CJB110 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGTTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T TAG C T T AAG AC GT AAT AAAACGGAT AT AT C AG AT 

SEQ ID NO. 5510 

STRAIN 1169NT 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTGGTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
TAT T T T AG C T TAAGAC GT AAT AAAAC GGAT AT AT C AG AT 

SEQ ID NO. 5511 

STRAIN JM9130013 

TTCCCAGCTAGCTGTCGTTATCGTCCAACTTGCTCTACGTATATGATAGA 
AGCTATTCAAAAACATGGTCTAAAAGGTGTTCTGATGGGGATTGCACGTA 
TTTTGCGATGTCATCCCTTAGCCCACGGAGGAAATGATCCTGTCCCTGAT 
CAT T T T AG CT T AAG AC GT AAT AAAAC G GAT AT AT C AG AT 

SEQ ID NO. 5512 

STRAIN 2 603 frame: 1 

MLKSFLIFLVRFYQKNISPAFPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPL 
AHGGNDPVPDHFSLRRNKTDISD 

SEQ ID NO. 5513 

STRAIN 0 90 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFS 

SEQ ID NO. 5514 

STRAIN A909 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
I 

SEQ ID NO. 5515 

STRAIN H36B frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5516 

STRAIN 18RS21 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5517 

STRAIN M732 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5518 

STRAIN COH1 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFS 

SEQ ID NO. 5519 

STRAIN M7 81 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 
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SEQ ID NO. 5520 
STRAIN CJB110 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5521 

STRAIN 1169NT frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGWMGIARILRCHPLAHGGNDPVPDYFSLRRNKTD 
ISD 

SEQ ID NO. 5522 

STRAIN JM9130013 frame: 1 

FPASCRYRPTCSTYMIEAIQKHGLKGVLMGIARILRCHPLAHGGNDPVPDHFSLRRNKTD 
ISD 

SEQ ID NO. 5601 
STRAIN 2603 

aagaagcttacttttatttgggatttagatgggacattaatagattcgta 
tgtaccaattatggaagctcttgaagaaacctatcgtcattttggtttaa 
tatttgataaagaattaatccatgaatatattttacaggaatcagtgggg 
aaattattggtaaacctttcagaggaagagcaaatacctcatgaaaaact 
gaaagcatattttacaaaagaacaagaaagtcgagattctaaaatacatt 
taatgccatatgcaaaagagattttagaatggaccaaagaacaagatatc 
cccaattttatgtatacacataaaggagcaagtacgcattcagtgttgga 
aaccttgcagatctctcattattttgatgaaattttaactggtgtttcgg 
gattcgagcgaaaaccacatccacaagggattaattatttagttaaacga 
tattctttagataaatcaatgacttattacataggagatcgtccactaga 
tttggaggttgctcaaaatgctggtataaaatccataaacttaaggttag 
agaattccaaagaaaactataatatttcaagtctcaaagatataatatca 
cttgatttcactcgtttggat 

SEQ ID NO. 5602 
STRAIN COH1 

AAG AAG CT T AC T T T T AT T TG GGAT T TAG AT GGG AC AT T AA 
T AGATT CGTATGTAC CAATT ATGGAAGCT CT T GAAGAAACCT AT CGT CAT 
T T T G G C TT AAT AT T T G AT AAAG AAT T AAT C CAT G AAT AT AT T T T AC AGG A 
ATCAGTGGGGCAATTATTGGTAAACCTTTCAGAGGAAGAGCAAATACCTC 
AT G AAAAAC T GAAAG CAT AT T T T AC AAAAGAAC AAG AAAGT CGAGAT T CT 
AAAAT AC AT T T AAT GC CAT AT G C AAAAG AG AT T T T AGAAT GG AC C AAAG A 
AC AAG AT AT T C C C AAT T T T AT GT AT AC AC AT AAAGGAG C AAG T AC G CAT T 
CAGTGTTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACT 
GGT GT T T C GGGAT T CGAG C G AAAAC C AC AT C C AC AAG GGAT T AAT TAT T T 
AGTT AAACGATAT T CT TT AGATAAAT C AAT G ACT TAT TAG AT AG GAG AT C 
GTCCACTAGATTTGGAGGTTGCTCAAAATGCTGGTATAAAATCCATAAAC 
T T AAG G T TAG AG AAT T C C AAAG AAAAC TAT AAT AT T T C AAGT CT C AAAG A 
TAT AAT AT C ACT T GAT T T C AC T C GT T T G GAT 

SEQ ID NO. 5603 

STRAIN A90 9 

AAG AAG C T T ACT T T TAT T T GG G AT T T AGAT GGG AC AT T AAT 

AG AT T C GT AT GT AC C AAT T AT G G AAG C T C T T G AAG AAAC C TAT CGT CAT T T T GGT T T AAT 
ATTTGATAAAGAATTAATCCATGAATATATTTTACAGGAATCAGTGGGGAAATTATTGGT 
AAAC CTTT C AGAGGAAGAGCAAATAC CT CATGAAAAACT G AAAGCAT ATT TT ACAAAAGA 
ACAAGAAAGT CGAGAT T CT AAAAT ACAT T T AAT G CC AT AT GCAAAAGAG AT TTTAGAATG 
GACCAAAGAACAAGATATCCCCAATTTTATGTATACACATAAAGGAGCAAGTACGCATTC 
AGTGTTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGG 
AT T C GAG C G AAAAC C AC AT C C AC AAGG G AT T AAT TAT T T AGT T AAAC GAT AT T C T T TAG A 
T AAAT C AAT GAC T T AT T A CAT AGG AG AT C GT C C AC TAG AT T T G GAG G T T G CT C AAAAT G C 
T GGT AT AAAAT C CAT AAAC T T AAG G T TAG AG AAT T C C AAAG AAAAC TAT AAT AT T T C AAG 
T CT C AAAG AT AT AAT AT C ACT T GAT TTCACTCGT 

SEQ ID NO. 5604 

STRAIN H3 6B 
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AAGAAGCTTACTTTTATTTGGGATTTAGATGGGACATTAATAGATTCG 
TATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGTTTAATATTTGAT 
AAAGAAT T AAT C CAT GAAT AT AT T T T AC AG GAAT C AGT GG G GAAAT TAT T GGT AAAC C T T 
T C AG AG G AAG AG C AAAT AC CT CAT GAAAAAC T GAAAG CAT AT T T T AC AAAAGAAC AAG AA 
AG T CG AGAT T C T AAAAT AC AT T T AAT G C CAT AT G C AAAAGAGAT T T TAG AAT GG AC C AAA 
GAAC AAG AT AT C CC C AAT T T T AT GT AT AC AC AT AAAGG AG C AAGT AC GC AT T C AGT GT T G 
GAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATTCGAG 
C GAAAAC C AC AT C C AC AAG GGAT T AAT T AT T T AGT T AAAC GAT AT T C T T TAG AT AAAT C A 
AT G AC T TAT T AC AT AGG AG AT C GT C C AC T AGAT T T GG AG GT T GC T C AAAAT G C T G GT AT A 
AAAT C CAT AAAC T T AAGGT TAG AG AAT T C C AAAG AAAAC TAT AAT AT T T C AAG T C T C AAA 
GAT AT AAT AT C ACT T GAT T T C ACT C G T T T G GAT 

SEQ ID NO. 5605 

STRAIN 18RS21 

AAG AAG C T T AC T T T TAT T T G G GAT T TAG AT G GG AC AT T AAT AG AT T 

C GT AT GT AC C AAT T AT GGAAG C T C T T G AAG AAAC C T AT C GT C AT T T T GG T T T AAT AT T T G 
AT AAAG AAT T AAT C CAT GAAT AT AT T T T AC AG GAAT C AGT GG GG AAAT TAT T G G T AAAC C 
T T T C AGAG G AAG AG C AAAT AC C T CAT GAAAAAC T GAAAG CAT AT T T T AC AAAAGAAC AAG 
AAAGT C GAG AT T C T AAAAT AC AT T T AAT G C CAT AT G C AAAAGAGAT T T TAG AAT GG AC C A 
AAG AAC AAGAT AT C C C C AAT T T T AT GT AT AC ACAT AAAGG AG C AAGT AC GC AT T C AG T GT 
TGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATTCG 
AGCGAAAACCACATCCACAAGGGATTAATTATTTAGTTAAACGATATTCTTTAGATAAAT 
C AAT G ACT TAT TAG AT AG G AGAT C GT C C ACT AG AT T T G GAGGT T G C T C AAAAT G C T GGT A 
T AAAAT C CAT AAAC T T AAGGT T AGAG AAT T C C AAAGAAAAC T AT AAT AT T T C AAG T CT C A 
AAGAT AT AAT AT C AC T T GAT T T C AC T C G T T T G GAT 

SEQ ID NO. 5606 

STRAIN M732 

AAG AAG C T T AC T T T TAT T T G G GAT T TAG AT GG G AC AT T AAT AG AT 

TCGTATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGCTTAATATTT 
GAT AAAGAAT T AAT C CAT GAAT AT AT T T T AC AG GAAT C AGT GG GG C AAT TAT T G GT AAAC 
CT T T C AG AG G AAG AGC AAAT AC CT CAT G AAAAACT GAAAG CAT AT T T T AC AAAAGAAC AA 
GAAAGTCGAGATTCTAAAATACATTTAATGCCATATGCAAAAGAGATTTTAGAATGGACC 
AAAGAACAAGATATTCCCAATTTTATGTATACACATAAAGGAGCAAGTACGCATTCAGTG 
TTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATTC 
GAG C GAAAAC C AC AT C C AC AAGGG AT T AAT T AT T TAG T T AAAC GAT AT T C T T TAG AT AAA 
T C AAT G ACT TAT T AC AT AGG AG AT C G T C C AC TAG AT T T G G AG GT T G C T C AAAAT G C T G G T 
AT AAAAT C CAT AAAC T T AAG GT TAG AG AAT T C C AAAG AAAACT AT AAT AT T T C AAG T CT C 
AAAGATATAATATCACTTGATTTCACTCGTTTGGAT 

SEQ ID NO. 5607 

STRAIN CJB110 

AAG AAG C T T AC T T T TAT T T G G GAT T TAG AT G G G AC AT T 

AATAGATTCGTATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGCTT 
AAT AT T T GAT AAAG AAT T AAT C CAT GAAT AT AT T T T AC AGG AAT C AG T G G GG C AAT TAT T 
GG T AAAC C T T T C AG AG G AAG AG C AAAT AC C T CAT GAAAAAC T GAAAG CAT AT T T T AC AAA 
AG AAC AAG AAAGT C GAG AT T CT AAAAT AC AT T T AAT G C CAT AT G C AAAAGAGAT T T TAG A 
AT G G AC C AAAG AAC AAGAT AT C C C C AAT T T T AT G T AT AC AC AT AAAGG AGC AAGT AC G C A 
TTCAGTGTTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTC 
T G GAT T CG AG C G AAAAC C AC AT C C AC AAGG GAT T AAT TAT T T AGT T AAAC GAT AT T C T T T 
AG AT AAAT C AAT GAC T T AT T AC AT AG G AGAT C GT C C C CT AG AT T T GG AGGT T G C T C AAAA 
T G CT G GT AT AAAAT C C AT AAACT T AAG G T TAG AG AAT T C C AAAG AAAAC T AT AAT AT T T C 
AAG T CT C AAGG AT AT AAT AT C AC T T GAT T T C AC T C G T T 

SEQ ID NO. 5608 

STRAIN 1169NT 

a AG AAG CT T AC T T T TAT T T G G GAT T TAG AT G GG AC AT T AAT AG AT T C G T AT GT AC C AAT T A 
TAGAAGCTCTTGAAGAAACCTATCGTCATTTTGGCTTAATATTTGATAAAGAATTAATCC 
AT GAAT AT AT T T TAG AG GAAT C AGT GGGG AAAT TAT T G GT AAAC C T T T C AG AG G AAG AG C 
AAAT ACC T CAT GAAAAAC T GAAAG CAT AT T T T AC AAAAGAAC AAG AAAGT C GAG AT T C T A 
AAAT AC AT T T AAT G C CAT AC G C AAAAGAGAT T T TAG AAT G GAC C AAAG AAC AAG AT AT CC 
C C AAT T T TAT G TAT AC AC AT AAAG GAG C AAGT AC G CAT T C AGT G T T G G AAAC C T T G C AG A 
TCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATTCGAGCGAAAACCACATC 
CACAAGGGATTAATTATTTAGTTAAACGATATTCTTTAGATAAATCAATGACTTATTACA 
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T AGG AGAT C GT C C C C TAG AT T T GG AG G T T GC T C AAAAT GCT GGT AT AAAAT C C AT AAACT 
T AAGGT T AGAGAATT CCAAAGAAAACTATAAT ATT T CAAGTCT CAAGGAT AT AAT AT CAC 
TTGATTTCACTCGTTTGGAT 

SEQ ID NO. 5609 

STRAIN JM9130013 

AAGAAGCTT ACTTTT ATT TGGGATTT AGAT GGGACATTAATAGA 

TTCGTATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGTTTAATATT 
T GAT AAAGAATTAAT CC ATGAAT AT ATTTT AC AGGAAT CAGTGGGGAAATT ATT GGT AAA 
CCTTTCAGAGGAAGAGCAAATACCTCATGAAAAACTGAAAGCATATTTTACAAAAGAACA 
AG AAAGT C GAG AT T C T AAAAT AC AT T T AAT G CC AT AT G C AAAAG AG AT T T T AGAAT GGAC 
CAAAGAACAAGATATCCCCAATTTTATGTATACACATAAAGGAGCAAGTACGCATTCAGT 
GTTGGAAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCGGGATT 
CGAGCGAAAACCACATCCACAAGGGATTAATTATTTAGTTAAACGATATTCTTTAGATAA 
AT C AAT G AC T TAT T AC AT AG GAG AT C G T C CAC TAG AT T T G GAG GT T GCT C AAAAT G CT G G 
TAT AAAAT C C AT AAAC T T AAGGT TAG AG AAT T C C AAAG AAAAC T AT AAT AT T T C AAG T C T 
CAAAGAT AT AATAT C ACT TGAT TT CACT CGT 

SEQ ID NO. 5610 

STRAIN 090 

AAGAAGCTTACTTTTATTTGG 

GATTTAGATGGGACATTAATAGATTCGTATGTACCAATTATGGAAGCTCT 
T GAAGAAACCT AT CGT CATT TT GGCTT AAT AT T T GAT AAAGAATTAAT CC 
AT G AAT AT AT T T T AC AGG AAT C AG T GG G G C AAT TAT T G GT AAAC C T T T C A 
GAGGAAGAGCAAATACCTCATGAAAAACTGAAAGCATATTTTACAAAAGA 
AC AAG AAAGT C GAG AT T C T AAAAT AC AT T T AAT G C CAT AT GC AAAAG AG A 
TTTT AGAATGGACC AAAGAAC AAGAT AT CC C C AAT TTTATGT AT ACACAT 
AAAGGAGCAAGTACGCATTCAGTGTTGGAAACCTTGCAGATCTCTCATTA 
TTTTGATGAAATTTTAACTGGTGTTTCTGGATTCGAGCGAAAACCACATC 
CACAAGGGATT AATT AT T T AGTT AAACGAT AT T CT T T AGAT AAAT CAAT G 
ACTTATTACATAGGAGATCGTCCCCTAGATTTGGAGGTTGCTCAAAATGC 
T GGT AT AAAAT C CAT AAACT T AAGGT T AGAGAATT C CAAAGAAAACT AT A 
AT ATTT C AAGT CT CAAGGAT AT AAT AT CAC T T GAT T T CACT CGT 

SEQ ID NO. 5611 

STRAIN M781 

AAG AAGCT T AC T T T TAT T T G G GAT T TAG AT G G G AC AT T AAT AG AT T C GT 
ATGTACCAATTATGGAAGCTCTTGAAGAAACCTATCGTCATTTTGGCTTA 
AT AT T T GAT AAAG AAT T AAT C CAT G AAT AT AT T T T AC AG G AAT C AGT GGG 
G CAAT TAT T GGT AAAC C T T T C AGAG G AAGAG C AAAT AC C T CAT GAAAAAC 
TGAAAGCATAT TTTACAAAAGAACAAGAAAGT CGAGATT yT AAAAT ACAT 
TT AAT GC CAT ATGCAAAAG AGAT T T T AGAATGGAC C AAAGAAC AAGAT AT 
T C C CAAT T T TAT GT AT AC AC AT AAAG GAG C AAG T AC G C AT T C AGT GT T GG 
AAACCTTGCAGATCTCTCATTATTTTGATGAAATTTTAACTGGTGTTTCG 
GG AT T C GAG C G AAAAC CAC AT C CAC AAG G GAT T AAT TAT T T AGT T AAAC G 
ATATTCTTTAGATAAATCAATGACTTATTACATAGGAGATCGTCCACTAG 
ATTTGGAGGTTGCTCAAAATGCTGGTATAAAATCCATAAACTTAAGGTTA 
GAG AAT T C C AAAG AAAAC TAT AAT AT T T C AAG T C T CAAAGAT AT AAT AT C 
ACTTGATTTCACTCGT 

SEQ ID NO. 5612 
STRAIN 2603 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5613 

STRAIN A90 9 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTR 
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SEQ ID NO. 5614 

STRAIN H3 6B frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5615 

STRAIN 18RS21 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5616 

STRAIN M732 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5617 

STRAIN COH1 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLEN SKEN YNI S S LKD IIS LDFTRL D 

SEQ ID NO. 5618 

STRAIN CJB110 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTR 

SEQ ID NO. 5619 

STRAIN 116 9NT frame: 1 

KKLTFIWDLDGTLIDSYVPIIEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTRLD 

SEQ ID NO. 5620 

STRAIN JM9130013 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGKLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTR 

SEQ ID NO. 5621 

STRAIN 090 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDSKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLEN S KEN YN ISSLKDIISLDFTR 

SEQ ID NO. 5622 

STRAIN M781 frame: 1 

KKLTFIWDLDGTLIDSYVPIMEALEETYRHFGLIFDKELIHEYILQESVGQLLVNLSEEE 
QIPHEKLKAYFTKEQESRDXKIHLMPYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQ 
ISHYFDEILTGVSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKSIN 
LRLENSKENYNISSLKDIISLDFTR 

SEQ ID NO: 5701 
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STRAIN 2603 

AT G C T T AT GAC AAAAAT AAT AGGACT GAC AGG AGGG AT AGC T T CT 

GG AAAGT C AACGGT AAC AAAAAT AAT AC G AGAAT C AGGTT T T AAAGT C AT AG AT G CG G AT 
C AAGT GGT T C AT AAAT T GC AAG C T AAGGGT G GG AAAC T T T AC C AAG CT T TAT T AGAAT GG 
TTGGGTCCCGAGATACTTGATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATG 
AT T T T T G C T AAT C C AG AC AAT AT GAAG AC AT C AG C T AGGCT AC AAAAT AGT AT CAT T C GT 
C AAG AGT TAG CAT GT C AG C GCG AC C AAT T AAAAC AAAC AG AAG AGAT AT T T T T C AT GGAT 
AT T CCT T TAT T GAT T GAAG AAAAGT AT AT AAAAT GGT T T GAT GAG AT T T G GT T GGT AT T T 
GTTGATAAAGAAAAACAATTACAACGATTAATGGCCCGTAACAACTACAGTCGAGAAGAA 
G C AG AAT T AC G AC T T T C AC AC C AAAT G C CT T T AAC AGAT AAAAAAAGT T T C G CT AGT C T T 
ATTATTGACAATAATGGTGATTTAATAACTTTAAAAGAGCAAATATTGGATGCTCTTCAA 
CGTTTA 

SEQ ID NO: 5702 

STRAIN 090 

AAGT C AAC G GT AAC AAAAAT AAT AC GAG AAT C AG 

G T T T T AAAGT CAT AGAT G CG G AT C AAGT G G T T CAT AAAT T G C AAG CT AAG 
GGTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACT 
TGATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTG 
C T AAT C C AGAC AAT AT GAAG AC AT C AGC T AGG C T AC AAAAT AG TAT CAT T 
C GT C AAGAGT T AG CAT GT C AG C G CG AC C AAT T AAAAC AAAC AG AAGAGAT 
AT T T T T CGT GG AT AT T C C TT TAT T GAT T GAAG AAAAGT AT AT AAAAT GGT 
T T GAT GAG AT T T GG T T GGT AT T T GT T G AT AAAG AAAAAC AAT T AC AAC G A 
T T AAT GG C C C G T AAC AAC T AC AG T C GAG AAG AAG C AGAAT T ACGAC T T T C 
AC AC C AAAT G C C T T T AAC AG AT AAAAAAAG T T T C G CT AGT CT T AT TAT T A 
AT AAT AAT GGT GAT T T AAT AAC T T T AAAAG AG C AAAT AT T GG AT G C T C T T 
CAACGTTTA 

SEQ ID NO: 5703 

STRAIN A909 

AAGT C AAC G GT AAC AAAAAT AAT AC GAG AAT C AG 

GTTTTAAAGTCATAGATGCGGATCAAGTGGTTCATAAATTGCAAGCTAAG 
GGTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACT 
TGATGCTGATGGTGAGTT GGAT AGAC CAAAGCTTTCTCAAATGATTTTTG 
C T AAT C C AG AC AAT AT GAAGAC AT C AG C TAG G C T AC AAAAT AG TAT CAT T 
CGT C AAG AGT TAG CAT G T C AG CG C GAC C AAT T AAAAC AAAC AG AAG AGAT 
AT T T T T CAT GGAT AT T C C T T T AT T GAT T GAAG AAAAGT AT AT AAAAT GGT 
TTGATGAGATTT GGT TGGT ATT TGTT GAT AAAG AAAAAC AAT TACAACGA 
T T AAT G GC C C GT a ACAAC T AC AGT C GAG AAG AAG C AG AAT T ACG ACT T T C 
ACACCAAATGCCTTTAACAGATAAAAAAAGTTTCGCTAGTCTTATTATTG 
AC AAT AAT GGT GAT T T AAT AAC T T T AAAAG AGC AAAT AT T GGAT G C T C T T 
CAACGTTTA 

SEQ ID NO: 5704 

STRAIN H3 6B 

AAGTCAACGGTAACAAAAATAATACGAGAATCAGG 

TTTTAAAGTCATAGATGCGGATCAAGTGGTTCATAAATTGCAAGCTAAGG 
G T G GGAAAC T T T AC C AAG CT T TAT TAG AAT GGT T G G G T C C CG AG AT AC T T 
GATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGC 
T AAT C C AG AC AAT AT GAAGAC AT C AG CT AG G C T AC AAAAT AG TAT CAT T C 
G T C AAGAGT TAG CAT G T C AG C G C GAC C AAT T AAAAC AAAC AG AAG AG AT A 
T T T T T CAT G GAT AT T C CT T T AT T G AT TG AAG AAAAGT AT AT AAAAT G GT T 
T GAT G AGAT T T G GT T G GT AT T T GT T GAT AAAG AAAAAC AAT T AC AAC GAT 
T AAT GGC C C G t AAC AAC T AC AG T C GAG AAG AAG C GG AAT T AC G ACT T T C A 
C AC C AAAT AC C T T T AAC AG AT AAAAAAAGT T T C G C T AGT C T TAT TAT T G A 
TAATAATGGTGATTTAATAACTTTAAAAGAGCAAATGTTGGATGCTCTTC 
AACGTTTA 

SEQ ID NO: 5705 

STRAIN 18RS21 

AAGT C AAC GG T AAC AAAAAT AAT AC GAG AAT C AG G 

TTTTAAAGTCATAGATGCGGATCAAGTGGTTCATAAATTGCAAGCTAAGG 
GTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTT 
GATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGC 
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TAATCCAGACAATATGAAGACATCAGCTAGGCTACAAAATAGTATCATTC 
GT CAAGAGT T AGC ATGTCAGCGCGAC CAAT T AAAACAAACAGAAGAGAT A 
T T T T T CAT G GAT AT T C CT T T AT T GAT T GAAG AAAAG T AT AT AAAAT GGT T 
T GAT G AGAT T T G GT T G GT ATT T GT T GAT AAAGAAAAAC AAT T AC AAC GAT 
T AAT GGC C CGT AAC AAC T AC AGT CG AG AAGAAGC AG AAT T AC G ACT T T C A 
C AC C AAAT GCCT T T AAC AG AT AAAAAAAGT T T C GCT AGT CT T AT TAT T GA 
CAAT AAT GGT GAT T T AAT AAC T T T AAAAG AG C AAAT AT T GGAT GC T C T T C 
AACGTTTA 

SEQ ID NO: 5706 
STRAIN M732 

AAGT CAACGGT AAC AAAAAT AAT ACG AGAAT CAGGTT 
TTAAAGTCATAGATGCGGATCAAGTGGTTCATAAATTGCAAGCTAAGGGT 
GGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTTGA 
TGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGCTA 
ATCCAGACAATATGAAGACATCAGCTAGGCTACAAAATAGTATCATTCGT 
CAAGAGTTAGCAT GT CAGCGCGAC CAATT AAAACAAACAGAAGAGAT ATT 
T T T CAT GGAT AT T C C TT T AT T GAT T G AAG AAAAGT AT AT AAAAT G GT T T G 
AT GAGATT T GGT T GGT ATT T GT T GAT AAAGAAAAAC AAT T ACAACGATT A 
AT G G C C C GT AAC AAC T AC AG T C GAGAAG AAG C AGAAT T AC GAC T T T C AC A 
CCAAATGCCTTTAACAGATAAAAAAAGTTTCGCTAGTCTTATTATTGACA 
ATAATGGTGATTTAATAACTTTAAAAGAGCAAATATTGGATGCTCTTCAA 
CGTTTA 

SEQ ID NO: 5707 

STRAIN COH1 

AAGT CAACGGTAAC AAAAAT AAT ACG AGAAT CAGGT 

T T T AAAGT CAT AGAT G C G GAT C AAG T G G T T CAT AAAT T GC AAG C T AAGG G 

TGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTTG > \ 

ATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGCT 

AAT C C AGAC AAT AT GAAG AC AT C AG C TAG GC T AC AAAAT AG TAT CAT T C G 

T C AAG AGT T AGC AT GT C AG C GC G AC C AAT T AAAAC AAAC AG AAG AG AT AT 

T T T T CAT GGAT AT T C C T T TAT T GAT T GAAG AAAAGT AT AT AAAAT GGT T T 

GATGAGATTTGGTT GGT AT T T GTT GAT AAAGAAAAAC AATT AC AACGATT 

AAT G G C C C GT a AC AAC T AC AG T C GAG AAGAAG C AGAAT T AC GAC T T T C AC 

ACCAAAT GC CTTTAACAGAT AAAAAAAGT TT CGCTAGT CTT AT TAT T GAC 

AATAATGGTGATTTAATAACTTTAAAAGAGCAAATATTGGATGCTCTTCA 

ACGTTTA 

SEQ ID NO: 5708 

STRAIN M7 81 

AAGTCAAQGGTAACAAAAATAATACGAGAATCAGG 

T T T T AAAG T CAT AG AT GCG GAT C AAGT G GT T CAT AAAT T G C AAG C T AAG G 
GTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTT 
GATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGC 
T AAT C C AG AC AAT AT GAAG AC AT C AG C TAG G C T AC AAAAT AGT AT CAT T C 
G T C AAG AG T TAG C AT GT C AG C G CG AC CAAT T AAAAC AAAC AG AAG AG AT A 
T T T TT CATGGAT AT T C CT T T AT TG AT T GAAGAAAAGT AT AT AAAAT GGTT 
T GATGAGATTT GGTT GGT ATT T GT T GAT AAAGAAAAAC AATT ACAACGAT 
T AAT G G C C C G T AAC AAC T AC AG T C GAG AAGAAG C AG AAT T AC GAC T T T C A 
C AC CAAATGCCTTTAACAGAT AAAAAAAGT TT CGCTAGT CTT ATT AT TGA 
CAAT AAT G GT G AT T T AAT AAC T T T AAAAG AG C AAAT AT T G GAT G C T C T T C 
AACGTTTA 

SEQ ID NO: 5709 

STRAIN CJB110 

AAGT CAACGGT AAC AAAAAT AAT ACG AG AA 

TCAGGTTTTAAAGT CAT AGATGCGGATC AAGT GGTT CAT AAAT TGCAAGC 
TAAGGGTGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGA 
TACTTGATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATT 
T T T G C T AAT C C AG AC AAT AT GAAG AC AT C AG C T AG G CT AC AAAAT AG T AT 
CAT T C GT CAAGAGT TAG CAT G T C AG C G C GAC CAAT T AAAAC AAAC AG AAG 
AGAT AT T T T T C GT GG AT AT T C C T T TAT T GAT T GAAG AAAAGT AT AT AAAA 
T GG T T T GAT GAG AT T T GGT T G G T AT T T G T T G AT AAAG AAAAAC AAT T AC A 
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ACGATTAATGGCCCGTaACAACTACAGTCGAGAAGAAGCAGAATTACGAC 
T T T C AC AC C AAAT G C C T T T AAC AG AT AAAAAAAGT T T CG CT AGT CT T AT T 
AT T AAT AAT AAT GGT GAT T T AAT AAC T T T AAAAG AG C AAAT AT T G G ATG C 
TCTTCAACGTTTA 

SEQ ID NO: 5710 

STRAIN 1169NT 

AAGT CAACGGTAACAAAAAT AATACGAGAAT CAGG 

T TTTAAAGT CAT AGAT G CGGAT CAAGT GGTT CAT AAATT GCAAGCT AAGG 
GT GGG AAACT T T AC C AAG C T T TAT TAG AAT GGT T G GGT CCC G AG AT ACT T 
GAT GC T GAT GGT G AGT T G GAT AG AC C AAAG C T T T C T C AAAT GAT T T T T G C 
T AAT C C AGAC AAT AT GAAG AC AT C AG C T AG G C T AC AAAAT AGT AT CAT T C 
GTCAAGAGTT AGCATGT CAGCGCGAC C AAT T AAAACAAACAGAAGAGATA 
T T T T T CAT G GAT AT T C C T T TAT T GAT T GAAG AAAAGT AT AT AAAAT GGT T 
T GATGAGAT T TGGT T GGT ATTTGTTGAT AAAGAAAAACAAT T ACAACGAT 
TAATGGCCCGTAACAACTACAGTCGAGAAGAAGCAGAATTACGACTTTCA 
CACCAAATACCTTTAACAGATAAAAAAAGTTTCGCTAGTCTTATTATTGA 
T AAT AAT GGT GAT T T AAT AAC T T T AAAAG AG C AAAT GT T GG AT GCT C T T C 
AACGTTTA 

SEQ ID NO: 5711 

STRAIN JM9130013 

AAGT C AACGGTAACAAAAAT AAT ACGAGAATCAGGT 

T TT AAAGT CAT AG AT G C GG AT CAAGT GGTT CAT AAAT T G C AAG C T AAGG G 
TGGGAAACTTTACCAAGCTTTATTAGAATGGTTGGGTCCCGAGATACTTG 
ATGCTGATGGTGAGTTGGATAGACCAAAGCTTTCTCAAATGATTTTTGCT 
AAT C C AG AC AAT AT GAAG AC AT C AGC T AG GC T AC AAAAT AGT AT CAT T C G 
T C AAG AGT TAG CAT G T C AG C G C G AC C AAT T AAAAC AAAC AGAAGAG AT AT 
T T T T CAT G GAT AT T C C T T T AT T GAT T GAAG AAAAGT AT AT AAAAT GGT T T 
GAT GAG AT T T GGT T G G T AT T T GT T GAT AAAG AAAAAC AAT T AC AAC GAT T 
AAT G G C C C G T AAC AAC T AC AGT C GAG AAG AAG C GGAAT T ACGAC T T T C AC 
ACCAAATACCTTTAACAGATAAAAAAAGTTTCGCTAGTCTTATTATTGAT 
AAT AAT GGT GATTT AAT AACT TT AAAAG AGC AAAT GT TGGAT GCT CT T CA 
ACGTTTA ( 

SEQ ID NO: 5712 

STRAIN 2 603 frame: 1 

MLMTKIIGLTGGIASGKSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEI 
LDADGELDRPKLSQMIFANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLI 
EEKYIKWFDEIWLVFVDKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIIDNN 
GDLITLKEQILDALQRL 

SEQ ID NO: 5713 

STRAIN 090 frame: 1 

KSTVTKIIRESGFPCVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFVDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIINNNGDLITLKEQILDALQR 
L 

SEQ ID NO: 5714 

STRAIN A909 frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FAN PDNMKT SARLQN S 1 1 RQELACQRDQLKQTEE I FFMD I PLL I EEKYIKW FDE I WLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIIDNNGDLITLKEQILDALQR 
L 

SEQ ID NO: 5715 

STRAIN H3 6B frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQIPLTDKKSFASLIIDNNGDLITLKEQMLDALQR 
L 

SEQ ID NO: 5716 
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STRAIN 18RS21 frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNN Y SREE AE LRLS HQMPLT DKKS FAS L 1 1 DNNGDL I TLKEQI LDALQR 
L 

SEQ ID NO: 5717 

STRAIN M732 frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKS FAS LII DNNGDL ITLKEQI LDALQR 
L 

SEQ ID NO: 5718 

STRAIN COH1 frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKS FAS LI I DNNGDLI TLKEQI LDALQR 
L 

SEQ ID NO: 5719 

STRAIN M781 frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIIDNNGDLITLKEQILDALQR' 
L 

SEQ ID NO: 5720 

STRAIN CJB110 frame: 1 

KSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFVDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQMPLTDKKSFASLIINNNGDLITLKEQI LDALQR 
L 

SEQ ID NO: 5721 

STRAIN 1169NT frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQIPLTDKKSFASLIIDNNGDLITLKEQMLDALQR 
L 

SEQ ID NO: 5722 

STRAIN JM9130013 frame: 1 

KSTVTKIIRESGFKVIDADQWHKLQAKGGKLYQALLEWLGPEILDADGELDRPKLSQMI 
FANPDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFV 
DKEKQLQRLMARNNYSREEAELRLSHQIPLTDKKSFASLIIDNNGDLITLKEQMLDALQR 
L 

SEQ ID NO. 5801 
STRAIN 2603 

ATGTTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTATGATTTTAGCCTTTTTATTG 
GTAAATAATAGTTATTTTAGACAGTTAATTGAAGAGCGGTCTAAACGTGAAACGGTAGTC 
CTTGTCATCATTTTCGGCTTGTTTGTTATTATATCTAATATAACAGGAATTGAAATAAAA 
GGGGATCGAAGTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACTT 
GCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACCTCTGGTTGGA 
TCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCAAGGAAGCTTTTCAGGTTCT 
TTCTATATTGTCAGTTCAGTTCTAGTCGGCATTGTTAGCGGAAAGATTGGTGATAAGCTT 
AAGGAAAACCATCTCTACCCTTCAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAA 
AGTATCCAGATGCTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTC 
ATTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATTTTGAAAACT 
TAT T T GT C AAAT G AAAGT C AGT T AC G C G C AG T T C AAACG AG AG AT GT T CT T G AAT T G ACT 
CGACAGACTCTGCCCTACCTTAGACAAGGTTTGACACCGCAATCTGCTAGGAGCGTTTGC 
GAAATTATAAAGAGGCATACTAACTTTGATGCTGTGGGATTAACAGATCGGTCAAACGTA 
TTAGCTCATATTGGTGTTGGCCATGATCACCATATTGCAGGACAACCGGTCAAAACAGAC 
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T TAT CT AAAAG T GT TAT T T T T GAT G G CG AAC C AAG AAT T G CGC AAGAT AAAG C G G C GAT T 
TCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGTAGTTCCTCTAAAAATAAAT 
G AT AAAAC TGTGGGTGCCT T AAAAAT GT AC T T T G C AGGAG AT AAG AC AAT GT C T G AG GT G 
GAGGAAAACCTAGTCCTTGGTTTAGCGCAAATATTTTCAGGACAACTGGCAATGGGGATA 
AC AG AGG AAC AAAAT AAGT T AGC CAGT AT G GC AG AGAT AAAGGCT T T AC AAG C AC AAAT C 
AAC C CT CAT TTCTTCTT T AAT G C C AT T AACAC AATT AGT GC AT T AAT C CG TAT T GAT T C T 
GATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTTTTAGAACAAGTTTGCAGGGT 
GGTCAGGATCGTGAGGTAACGCTTGAGCAAGAAAAATCACATGTGGATGCTTATATGAAT 
GTTGAAAAATTACGTTTCCCTGATAAATATCAGTTATCTTATGATATTAGTGCACCAGAA 
AAAATGAAGTTACCACCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCT 
T T C AAAGAAC G T AAGACG G AC AAC CAT AT AT T GG T T C AAAT AAAG C C AG AT G GT C AT TAT 
TAT TGTGTTTCTGT T AGT G AC AATG G AC AAGG AAT C T C AG AT AC TAT CAT T GAT AAAT T A 
G GT C AAG AAAC AGT T G C AG AGAG T AAGGGT AC AG GT AC T GC T C T AGT T AAT CT AAAT AAC 
AGG CT G AAT T TAT TAT AT G GT AGT G T AAGT T GC C TT CAT T T T T C GAG C G AC AAG AAT G G T 
ACAAAAGTTTGGTATCGAATACCTAATAGAATAAGGGAGGATGAGCATGAAAATTTTAAT 
TCT 



SEQ ID NO. 5802 

STRAIN 0 90 

T T GAT GG T GT T GT T AT T C C AAAG G C T AGG AAT TAT TAT 
GAT T T TAG C C T T T T T AT T G GT AAAT AAT AGT TAT T T C AG AC AGT T AAT T G 
AAGAGCGGTCTAAACGTGAAACGGTAGTACTTGTCATCATTTTCGGCTTG 
TTTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCGAAG 
TTTGGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCACTTG 
CT AAT AC AAG G ACT T T AGT TAT T AC AAC GG C AAGTT T GGT T GGT GG AC C T 
CTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCA 
AGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGCA 
T T GT TAG CGG AAAG AT T G G T GAT AAG C T T AAG G AAAAC C AT CT C T ACC C T 
T C AAC AAG C C AAGT TAT T T T AAT T AGT AT TAT T G C C GAAAGT AT C C AGAT 
G CT AT T T G T T GG TAT T T T T AC AG GAT GG G AAC T T G T C AAAAT GAT T GT C A 
TTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATT 
T T G AAAACT T AT T T G T C AAAT GAAAGT C AGT TACGCG CAGT T C AAACG AG 
AGAT G T T CT T G AAT T G AC T C G AC AGAC T CTGCCCTACCT C AG AC AAGG T T 
T G AC AC C G C AAT C T G C T AGG AG C GT T T G C G AAAT TAT AAAG AG G CAT AC T 
AAC T T T GAT G CT GT AGGAT T AAC AG AT CG GT C AAAC GT AT T AG CT CAT AT 
TGGTGTTGGC CAT GAT C AC CAT AT T GC AGGAC AAC CAGT C AAAAC AG AC C 
T AT CTAAAAGTGTT AT T T TTGAT GGCGAAC CAAGAAT TGCGCAAGAT AAA 
GCGG C GAT TTCTTGTC C AG AT C AC AAC T GT C AGT T AAAT T C T G C TAT T GT 
AGTT CCT CTAAAAATAAATG AT AAAACT GT GGGTGCCTTAAAAATGT ACT 
T T G C AGG AG AT AAG AC AAT GT C T G AGGT GG AGG AAAAC CT AGT C C T T G G T 
TTAGCGCAAATATTTTCAGGACAACTGGCAATGGGGATAACAGAGGAACA 
AAAT AAGT TAG C C AG TAT GG C AGAG AT AAAGG C T T T AC AAG C AC AAAT C A 
ACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGT 
AT T GAT TCT GAT AAAG C AC GT T AT GC ACTG ATG C AGT T AAGT AC T T T T T T 
TAG AAC AAGT T T G C AAG GT GGT C AGG AT C GT G AGG T AAC G C T T GAG C AAG 
AAAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCT 
GATAAATATCAGTTATCTTATGATATTAGTGCACCAGAAAAAATGAAGTT 
ACCGCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTAGACATGCTT 
T C AAAG AAC GT AAG AC G GAC AAC CAT AT AT T GGT T C AAAT AAAG C C AGAT 
GGT CAT TAT TAT TGTGTTTCTGT T AGT G AC AAT G G AC AAGG AAT C T C AGA 
TACTATCATTGATAAATTAGGTCAAGAAACAGTTGCAGAGAGTAAGGGTA 
CAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGT 
AGT G T AAGT T G C CT T CAT T T T T C GAG C G AC AAG AAT GGT AC AAAAGT T T G 
GT AT C GAAT AC C T AAT AG AAT AAG G GAG GAT G AGC AT G AAAAT T T T AAT T 
CT 



SEQ ID NO. 5803 

STRAIN A90? 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTAT 

GATTTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAATTG 

AAGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTTG 

T T T GT T AT TAT AT C T AAT AT AAC AG GAAT T G AAAT AAAAG GG GAT C G AAG 

TTTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACTTG 

CTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACCT 
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CTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCA 
AGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGCA 
T T GT T AG C G GAAAG AT T G GT G AT AAG CT T AAGG AAAAC CAT CT C T AC CC T 
T C AAC AAGC C AAGT TAT T T T AAT T AGT AT TAT T G C C G AAAGT AT C C AGAT 
GC T AT T T GT T GGC AT T T T T AC AG G AT GGG AAC T T GT C AAAAT GAT T GT C A 
TTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATT 
T T GAAAAC T T AT T T GT C AAAT GAAAG T C AGT T AC G C G C AGT T C AAAC GAG 
AGAT GTTCTTGAATTG ACT CGACAGACTCTGCCCTACCTTAGACAAGGTT 
T G AC AC C G C AAT C T G C TAG GAG C GT T T G C G AAAT T AT AAAG AG GC AT AC T 
AACT T T GAT G CT G T G GG AT T AAC AG AT C G GT C AAAC GT AT TAG C T CAT AT 
TGGTGTTGGCCATGATCACCATATTGCAGGACAACCGGTCAAAACAGACT 
TAT C T AAAAGT GT TAT T T T T GAT G G CG AAC C AAG AAT T G CG C AAG AT AAA 
GCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGT 
AGT T C C T CT AAAAAT AAAT GAT AAAAC TGTGGGTGCCT T AAAAAT GT AC T 
T T GC AGG AG AT AAG AC AAT GT CT G AG G T G GAG GAAAAC C TAG TCCTTGGT 
TTAGCGCAAATATTTTCAGGACAACTGGCAATGGGGATAACAGAGGAACA 
AAAT AAGT TAG C C AG TAT G G C AG AG AT AAAGG CT T T AC AAG C AC AAAT C A 
ACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGT 
AT T GAT T C T GAT AAAG C AC G T TAT G C AC T GAT GC AG T T AAGT AC T T T T T T 
TAGAACAAGTTTGCAGGGTGGTCAGGATCGTGAGGTAACGCTTGAGCAAG 
AAAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCT 
G AT AAAT AT CAGTT AT CTTATG AT ATT AGT GC AC C AG AAAAAATGAAGTT 
ACCACCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCTT 
T C AAAGAAC GT AAG AC GG AC AAC C AT AT AT T GGT T C AAAT AAAG C C AG AT 
GGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCAGA 
T AC TAT CAT T GAT AAAT TAG G T C AAG AAAC AGT T G C AG AG AGT AAGGG T A 
CAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGT 
AGTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTG 
GT AT C GAAT AC C T AAT AG AAT AAGGG AGG AT GAG CAT G AAAAT T T T AAT T 
CT 

SEQ ID NO. 5804 

STRAIN H36B 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTATG 

ATTTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAATTGA 

AGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTTGT 

TTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCGAAGT 

TTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACTTGC 

TAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACCTC 

TGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCAA 

GGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGCAT 

TGTTAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACCCTT 

C AAC AAG C C AAGT TAT T T T AAT T AGT AT TAT T G C CG AAAGT AT C C AG AT G 

CTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCAT 

TCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATTT 

T G AAAACT T AT T T GT C AAAT G AAAGT C AG T T AC G C GC AGT T C AAAC GAGA 

GATGTTCTTGAATTGACTCGACAGACTCTGCCCTACCTTAGACAAGGTTT 

GACACCGCAATCTGCTAGGAGCGTTTGCGAAATTATAAAGAGGCATACTA 

AC T T T GAT G C T GT GG G AT T AAC AG AT C G GT C AAAC GT AT TAG C T CAT AT T 

GGT GT T GGC C AT GAT C AC CAT AT T G C AGG AC AAC C G GT C AAAAC AG AC T T 

AT CT AAAAGT GTTATTTTTGATGGCGAACCAAGAATTGCGCAAGAT AAAG 

CGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGTA 

GTTCCTCT AAAAAT AAAT GAT AAAACT GTGGGTGCCTT AAAAAT GT ACT T 

T G C AG GAG AT AAG AC AAT G T C T G AGGT G GAG G AAAAC C T AGT CCTTGGTT 

T AG C G C AAAT AT T T T C AG G AC AACT GG C AAT G G GG AT AAC AG AG G AAC AA 

AAT AAGT TAG C C AG T AT G G C AG AG AT AAAGG C T T T AC AAG C AC AAAT C AA 

CCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGTA 

TTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTTTT 

AG AAC AAGT T T G C AG GG T GGT C AG GAT C GT G AG GT AAC G C T T G AG C AAG A 

AAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCTG 

AT AAAT AT C AGTT AT CT T ATGAT ATT AGT GC ACCAGAAAAAAT GAAGT T A 

CCACCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCTTT 

CAAAGAACGTAAGACGGACAACCATATATTGGTTCAAATAAAGCCAGATG 

GT CAT TAT TAT TGTGTTTCTGT T AGT G AC AAT G G AC AAGG AAT C T C AG AT 
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AC TAT CAT T GAT AAAT T AGGT C AAGAAAC AGT T G C AGAG AGT AAGGG T AC 
AG GT ACT G CT CT AG T T AAT CT AAAT AAC AG G CT GAAT T T AT T AT AT GGT A 
GTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTGG 
TAT C GAAT AC CT AAT AG AAT AAG G GAG GAT GAG CAT GAAAAT T T T AAT T C 
T 

SEQ ID NO. 5805 
STRAIN 18RS21 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTATG 
AT T T TAG C CT T T T TAT T GGT AAAT AAT AGT TAT T T TAG AC AGT T AAT T G A 
AGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTTGT 
TTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCGAAGT 
TTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACTTGC 
T AAT AC AAGG AC T T T AGT TAT T AC AAC GG C AAG TTTGGTTGGT GG AC C T C 
TGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCAA 
GG AAGC T T T T C AGG T T CT T T C TAT AT T GT C AGT T C AGT T C TAG T C GG CAT 
T GT T AG C GG AAAGAT T GGT GAT AAG C T T AAGGAAAAC CAT CT C T AC C C T T 
CAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCAGATG 
CTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCAT 
TCCAAT GAT GAT TTT AAAT AGTTTAGGTTCCACACTTTTCCTTGCGATTT 
T GAAAACT TAT T T GT C AAAT GAAAGT C AG T T AC GC G C AG T T C AAAC GAGA 
GAT G T T CT T GAAT T G AC T C G AC AG ACT CT G C C C T AC CT T AG AC AAGGT T T 
G AC AC CG C AAT CT G C T AGG AGCG TT T GCG AAAT T AT AAAG AG G C AT ACT A 
ACT T T GAT GCT GT GG G AT T AAC AG AT C GGT C AAAC GT AT TAG CT CAT AT T 
G GT G T T GG C C AT GAT C AC CAT AT T GC AG G AC AAC C G GT C AAAAC AG ACT T 
AT C T AAAAGT GT TAT T T T T GAT G G C G AAC C AAG a AT T G C GC AAGAT AAAG 
CGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGTA 
GTTCCTCTAAAAATAAATGATAAAACTGTGGGTGCCTTAAAAATGTACTT 
TGCAGGAGATAAGACAATGTCTGAGGTGGAGGAAAACCTAGTCCTTGGTT 
T AG CG C AAAT AT TTT C AGG AC AAC T GG C AAT GGGG AT AAC AG AG G AAC AA 
AAT AAGT TAG C C AGT AT GG C AG AG AT AAAGG C T T T AC AAGC AC AAAT C AA 
CCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGTA 
TTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTTTT 
AG AAC AAG T T T GC AGG GT GGT C AGG AT CGT G AG GT AAC G CT T G AGC AAGA 
AAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCTG 
AT AAAT AT C AG T TAT C T TAT GAT AT T AGT G C AC C AG AAAAAAT GAAGT T A 
C CAC CT T T T GGT T T AC AG GT ACT GGT AG AG AAT G C AG T T C GAC AT G C T T T 
C AAAG AACGT AAG ACG GAC AAC CAT AT AT T G G T T C AAAT AAAG C C AG AT G 
GTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCAGAT 
ACTAT CAT TGAT AAAT T AGGT CAAGAAACAGTTG CAGAGAGTAAGGGT AC 
AGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGTA 
GTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTGG 
TAT CGAAT ACCT AAT AGAATAAGGG AGGAT GAG CAT GAAAAT T T T AAT T C 
T 

SEQ ID NO. 5806 

STRAIN M732 

T T GAT GGTGTTGT TAT T C C AAAGG C TAG GAAT TAT TAT GAT 

TTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAATTGAAG 

AGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTTGTTT 

GTT AT TAT AT CT AAT AT AAC AGGAATT G AAATAAAAGGGGAT CGAAGT TT 

GGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCACTTGCTA 

AT AC AAGG ACT T T AGT TAT T AC AACG G C AAGT T T G GT T G G T GG AC C T C T G 

GTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCAAGG 

AAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGCATTG 

TTAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACCCTTCA 

ACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCAGATGCT 

ATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCATTC 

CAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATTTTG 

AAAACT T AT T T G T C AAAT GAAAGT C AG T T AC G C GC AGT T C AAAC GAG AG A 

T GT T C T T GAAT T GAC T C GAC AG AC T C T G C C CT ACC T TAG AC AAGG T T T G A 

CAC C G C AAT C T G C T AGG AG C G T T T G CG AAAT TAT AAAG AG G CAT ACT AAC 

TTTGATGCTGTGGGATTAACAGATCGGTCAAACGTATTAGCTCATATTGG 

TAT T G G C C AT GAT CAC CAT AT T G C AGG AC AAC C GGT C AAAAC AG ACT TAT 
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CTAAAAGTGTTATTTTTGATGGCGAACCAAGAATTGCGCAAGATAAAGCG 
GC G At TTCTTGTC C AG AT C AC AACT G T C AGT T AAAT T CT G C T AT T GT AG T 
TCCTCTAAAAATAAATGATAAAACTGTGTGTGCCTTAAAAATGTACTTTG 
CAGGAGATAAGACAATGTCTGAGGTGGAGGAAAACCTAGTCCTTGGTTTA 
G CGCAAAT AT T T T C AGG AC AAC T G G C AAT GG GGAT AAC AG AGG AAC AAAA 
T AAGT T AGC C AGT AT GG C AG AGAT AAAG GC T T T AC AAGC AC AAAT C AAC C 
C T C AT T T C T T CT T T AAT GC CAT T AAC AC AAT T AGT G CAT T AAT C C GT AT T 
GATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTTTTAG 
AAC AAGT T T GC AAGGT GGT C AG GAT C GT G AG GT AACG C T T GAG C AAG AAA 
AATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCTGAT 
AAAT AT C AGT TAT C T TAT GAT AT T AGT G C AC C AG AAAAAAT G AAGT T AC C 
GCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCTTTCA 
AAGAACGT AAG ACG G AC AAC CAT AT AT T G GTT C AAAT AAAGC CAG AT GGT 
CAT TAT TAT TGTGTTTCTG T T AGT G AC AAT GGAC AAGG AAT C T C AGAT AC 
TAT CAT T GAT AAAT TAG GT C AAGAAAC AGT T GC AGAGAGT AAG GGG AC AG 
GTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGTAGT 
GTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTGGTA 
T CGAAT AC C T AAT AG AAT AAGG G AGG AT GAG CAT G AAAAT T T T AAT T C T 

SEQ ID NO. 5807 

STRAIN COH1 

T T GAT GGTGTTGT TAT T C C AAAGGC T AGG AAT TAT 

TATGATTTTAGCCTTTTTATTGGTAAATAATAGTTATTTCAGACAGTTAA 
TTGAAGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGC 
TTGTTTGTTATTATATCTAATATAACAGGAATTGAAATAAAAGGGGATCG 
AAGTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCAC 
TTGCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGA 
CCTCTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTT 
TCAAGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCG 
G CAT T GT T AG C GGAAAGAT T GGT GAT AAG C T T AAG G AAAAC CAT CT CT AC 
CCTTCAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCA 
GATGCTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTG 
TCATTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCG 
ATTTTGAAAACTTATTTGTCAAATGAAAGTCAGTTACGCGCAGTTCAAAC 
G AGAG AT GT T C T T G AAT T G AC T C G AC AG AC T C T G C C C T AC CT TAG AC AAG 
GT T T G AC AC C G C AAT CT G C T AG GAG C GT T T G C G AAAT TAT AAAGAGG CAT 
ACT AAC T T T GAT G CT GT G GG AT T AAC AG AT C GGT C AAAC GT AT TAG CT C A 
TATTGGTGTTGGCCATGATCACCATATTGCAGGACAACCGGTCAAAACAG 
AC T TAT C T AAAAG T G T T AT T T T T GAT G G C G AAC C AAG AAT T G C G C AAG AT 
AAAGCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTAT 
T GT AGT T C CT C T AAAAAT AAAT GAT AAAAC TGTGTGTGCCT T AAAAAT G T 
AC T T T G C AGG AGAT AAG AC AAT GT CT G AGGT G G AGG AAAAC C T AGT C C T T 
G G T T TAG C G C AAAT AT T T T C AGG AC AAC T G G C AAT G GGGAT AAC AGAG G A 
AC AAAAT AAGT TAG C CAG TAT GG C AGAG AT AAAG G C T T T AC AAG C AC AAA 
T C AAC C C T CAT TTCTTCTT T AAT G C C AT T AAC AC AAT T AGT G C AT T AAT C 
CGTATTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTT 
TTTTAGAACAAGTTTGCAAGGTGGTCAGGATCGTGAGGTAACGCTTGAGC 
AAGAAAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTC 
C CT GAT AAAT AT C AGT TAT C T T AT GAT AT TAG T G C AC CAG AAAAAAT G AA 
GTTACCGCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATG 
C T T T C AAAG AAC G T AAG AC G G A C AAC CAT AT AT T G G T T C AAAT AAAG CCA 
GATGGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTC 
AGAT ACT AT CATT GAT AAAT T AGGT CAAGAAAC AGT T GCAGAGAGT AAGG 
GGACAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATAT 
GGTAGTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGT 
TT GGT AT CGAATAC CT AAT AGAATAAGGGAGGAT G AGC AT GAAAAT TTT A 
ATTCT 

SEQ ID NO. 5808 

STRAIN M781 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTA 

T GAT T T TAG C C T T T T TAT T G G T AAAT AAT AG T T AT T T CAG AC AGT T AAT T 

GAAGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCTT 

GT T T G T TAT TAT AT CT AAT AT AAC AGG AAT T G AAAT AAAAG G G GAT C G AA 
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GTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCACTT 
GCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACC 
TCTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTC 
AAGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGGC 
AT T G T TAG C GG AAAG AT T GGT G AT AAGCT T AAGGAAAAC CAT C T C TAG C C 
T T C AAC AAG C C AAG T T AT T T T AAT TAG TAT TAT T G C C G AAAG TAT C C AGA 
T GCT AT T T GT T GG C AT T T T T AC AGG AT G GG AACT T GT C AAAAT GAT T GT C 
ATTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGAT 
TTTGAAAACTTATTTGTCAAATGAAAGTCAGTTACGCGCAGtTCAAACGA 
GAG AT G T T C T T GAAT T GACT CGAC AGACT C T G C CC T AC C T T AG AC AAGGT 
T T G AC AC C G C AAT CT G C T AGGAG CGT T T GC G AAAT T AT AAAGAG G CAT AC 
T AACT T T GAT G C T GT GGG AT T AAC AG AT C GGT C AAAC GT AT T AG C T CAT A 
TTGGTGTTGGC CAT GAT C AC CAT AT T G C AGG AC AAC C GG T C AAAAC AG AC 
T TAT C T AAAAGT G T TAT T T T T GAT GGCGAAC C AAGAAT T G CG C AAG AT AA 
AGCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTG 
TAGTTCCTCTAAAAATAAATGATAAAACTGTGTGTGCCTTAAAAATGTAC 
T T T GC AG GAG AT AAG AC AAT GT CT G AG GT GGAGGAAAAC CT AGT C C T T G G 
T T T AG C G C AAAT AT T T T C AG G AC AACT GG C AAT GGGG AT AAC AG AG GAAC 
AAAAT AAG T TAG C C AGT AT G G C AGAGAT AAAGG CT T T AC AAG C AC AAAT C 
AACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCG 
TATTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTTT 
T T AG AAC AAGT T T G C AAGGT GGT C AGG AT CGT GAG GT AAC GC T T GAG C AA 
GAAAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCC 
TGATAAATATCAGTTATCTTATGATATTAGTGCACCAGAAAAAATGAAGT 
TACCGCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGCT 
TTCAAAGAACGTAAGACGGACAACCATATATTGGTTCAAATAAAGCCAGA 
TGGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCAG 
ATACTATCATTGATAAATTAGGTCAAGAAACAGTTGCAGAGAGTAAGGGG 
AC AGGT AC T G CT C T AG T T AAT CT AAAT AAC AGG C T GAAT T TAT TAT AT GG 
T AGT GT AAG T T G C C T T CAT T T T T CGAGCG AC AAGAAT G GT AC AAAAGT TT 
G GT AT CG AAT AC C T AAT AG AAT AAG G GAG GAT GAG CAT GAAAAT T T T AAT 
TCT 

SEQ ID NO. 5809 

STRAIN CJB110 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATTAT 

GAT T T TAG C CT T T T T AT T G G T AAAT AAT AGT TAT T T C AG AC AG T T AAT T G 

AAGAGCGGTCTAAACGTGAAACGGTAGTACTTGTCATCATTTTCGGCTTG 

TTT GTTATT AT AT CT AAT AT AACAGGAATT GAAAT AAAAGGGGAT CGAAG 

TTTGGTCGAGCGCCCTTTTCTAACAACGATTTCCCATTCTGACTCACTTG 

CTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGACCT 

CTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTTCA 

AG G AAG C T T T T C AGG T T C T T T CT AT AT T GT C AGT T C AGT T C T AGT CGGC A 

TTGTTAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACCCT 

T C AAC AAG C C AAGT TAT T T T AAT TAG TAT TAT T G C C G AAAG TAT C C AG AT 

GCTATTTGTTGGTATTTTTACAGGATGGGAACTTGTCAAAATGATTGTCA 

TTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGATT 

T T GAAAAC T TAT T T GT C AAAT G AAAGT C AG T T AC G C GC AG T T C AAAC GAG 

AG AT GT T CT T GAAT T G AC T C GAC AG ACT CT G C C C T AC C T C AG AC AAG GT T 

T G AC AC C G C AAT CT G C T AG G AG CG T T T G CG AAAT TAT AAAGAG G CAT AC T 

AACTTTGATGCTGTAGGATTAACAGATCGGTCAAACGTATTAGCTCATAT 

TGGTGTTGGC CAT GAT C AC CAT AT T G C AGG AC AAC C AG T C AAAAC AG AC C 

TAT C T AAAAGT GT TAT T T T T GAT G GC G AAC C AAG AAT T G C G C AAG AT AAA 

GCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATTGT 

AGT T CCT CT AAAAAT AAATGAT AAAACTGT GGGT GCCT T AAAAAT GT ACT 

T T G C AG G AG AT AAGAC AAT GT C T GAG G T G G AGG AAAAC CT AGT C C T T G G T 

TT AGCGC AAAT ATT TTCAGGACAACTGGCAATGGGGAT AAC AGAGGAACA 

AAAT AAGT TAGCCAGTATGGC AGAGAT AAAGGCT T TACAAGCAC AAAT CA 

ACCCTCATTTTTTCTTTAATGCCATTAACACAATTAGTGCATTAATCCGT 

AT T GAT TCT G AT AAAGC AC GT T AT G C AC T GAT GC AGT T AAG T AC T T T T T T 

TAGAACAAGTTTGCAAGGTGGTCAGGATCGTGAGGTAACGCTTGAGCAAG 

AAAAATCACATGTGGATGCTTATATGAATGTTGAAAAATTACGTTTCCCT 

GAT AAAT AT C AGT T AT CTT AT GAT ATT AGTGCAC CAGAAAAAAT GAAGT T 

ACCGCCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTAGACATGCTT 
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T C AAAGAAC GT AAG AC G G AC AAC CAT AT AT T G G T T C AAAT AAAG C C AG AT 
GGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCAGA 
T ACTAT CAT T GAT AAAT T AGGT C AAGAAACAGTTGCAGAGAGTAAGGGT A 
CAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATGGT 
AGTGTAAGTTGCCTTCATTTTTCGAGCGACAAGAATGGTACAAAAGTTTG 
GT AT C G AAT ACCT AAT AGAAT AAG GG AGGAT G AGC AT GAAAAT T T T AAT T 
CT 

SEQ ID NO. 5810 

STRAIN 1169NT 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATT 

ATGATT TT AGC CTTTTT ATT GGT AAAT AAT AGTTATTTCAGACAGTT AAT 
TGAAGAGCGGTCTAAACGTGAAACGGTAGTACTTGTCATCATTTTCGGCT 
T GT TT GTT AT TAT AT CT AAT AT AAC AGGAAT T GAAAT AAAAGGGG AT CG A 
AGTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACT 
TGCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGAC 
CTCTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTT 
CAAGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGG 
CATTGTGAGCGGAAAGATTGGTGATAAGCTTAAGGAAAACCATCTCTACC 
CTTCAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCAG 
AT GCT ATT T GTT GGCAT TTTT ACAGGATGGGAACT T GT CAAAAT GATTGT 
CAT T C C AAT GAT GAT T T T AAAT AGT T T AGGT T C C AC ACT T T T C C T T G CG A 
TTTT G AAAAC T TAT T T GT C AAAT GAAAGT C AGT T AC GC G C AG T T C AAACG 
AG AGAT GT T C T T GAAT T G AC T C GAC AG AC T C T G C C CT AC CT T AGAC AAGG 
T T T G AC AC C G C AAT C T G C T AGG AGC GT T T G C GAAAT TAT AAAG AG GC AT A 
CTAATTTTGATGCTGTGGGATTAACAGATCGGTCAAACGTATTAGCTCAT 
ATT G GT GT T G GC C AT GAT C AC CAT AT T G C AG GAC AAC C AGT C AAAAC AG A 
C CT AT CT AAAAGT GT T AT TTTT G AT GG C G AAC C AAG AAT T GC G C AAG AT A 
AAG C G G C GAT TTCTTGTC C AG AT C AC AAC T G T C AG T T AAAT T C T G CT ATT 
GTAGTTCCTCTAAAAATAAATGATAAAACTGTGGGTGCCTTAAAAATGTA 
CTT T G C AG G AG AT AAGAC AAT G T CT G AG GT GG AG G AAAAC C T AGT CC T T G 
GTT T AGCG C AAAT AT T T T C AGGAC AAC T G G C AAT GGGG AT AAC AG AG G AA 
CAAAATAAGTTAGCCAGTATGGCAGAGATAAAGGCTTTACAAGCACAAAT 
CAACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCC 
GT AT TG AT T C T GAT AAAG C ACGT T AT G C ACT GAT G C AGT T AAGT ACT T TT 
TTTAGAACAAGTTTGCAAGGTGGTCAGGATCGTGAGGTAACGCTTGAGCA 
AGAAAAAT C AC AT GT GG AT GCT T AT AT GAAT GT T G AAAAAT T AC GT T T CC 
CTGATAAATATCAGTTATCTTATGATATTAGTGCACCAGAAAAAATGAAG 
T T AC C G C C T T T T GGT T T AC AG GT ACT GG T AG AG AAT G C AGT T C GAC AT GC 
T T T T AAAG AACGT AAG ACGG AC AAC C AT AT AT T GGT T C AAAT AAAGC C AG 
ATGGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCA 
GAT AC TAT CAT T GAT AAAT TAG GT C AAG AAAC AGT T G C AG AG AG T AAG G G 
T AC AGG T AC T G CT CT AGT T AAT C T AAAT AAC AG GCT GAAT T TAT TAT AT G 
GT AGT G T AAGT T GC C T T CAT T T T T C G AG CG AC AAG AAT G GT AC AAAAGT T 
T GGT AT CGAAT ACCT AAT AGAAT AAGGGAGGATGAGC AT GAAAATTTTAA 
TTCT 

SEQ ID NO. 5810 

STRAIN JM9130013 

TTGATGGTGTTGTTATTCCAAAGGCTAGGAATTATT 

ATGATTTTAGCCTTTTTATTGGTAAATAAT AGTTATTTCAGACAGTT AAT 
TGAAGAGCGGTCTAAACGTGAAACGGTAGTCCTTGTCATCATTTTCGGCT 
T GT T T GT TAT TAT AT C T AAT AT AAC AG GAAT T GAAAT AAAAGG G GAT C G A 
AGTTTGGTCGAGCGCCCTTTTCTAACAACGATTTCTCATTCTGACTCACT 
TGCTAATACAAGGACTTTAGTTATTACAACGGCAAGTTTGGTTGGTGGAC 
CTCTGGTTGGATCAATTGTTGGTTTTATTGGAGGAGTTCATCGCTTTTTT 
CAAGGAAGCTTTTCAGGTTCTTTCTATATTGTCAGTTCAGTTCTAGTCGG 
CAT T GT T AG C GG AAAG AT TG GT G AT AAG C T T AAG G AAAAC CAT C T CT AC C 
CTTCAACAAGCCAAGTTATTTTAATTAGTATTATTGCCGAAAGTATCCAG 
ATGCTATTTGTTGGCATTTTTACAGGATGGGAACTTGTCAAAATGATTGT 
CATTCCAATGATGATTTTAAATAGTTTAGGTTCCACACTTTTCCTTGCGA 
TTTTGAAAACTTATTTGTCAAATGAAAGTCAGTTACGCGCAGTTCAAACG 
AGAGATGTTCTTGAATTGACTCGACAGACTCTGCCCTACCTTAGACAAGG 
TTTGACACCGCAATCTGCTAGGAGCGTTTGCGAAATTATAAAGAGGCATA 
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CT AAC T T T GAT G C T GT GGG AT T AAC AG AT C GGT C AAAC G TAT TAG C T CAT 
ATTGGTGTTGGCCATGATCACCATATTGCAGGACAACCGGTCAAAACAGA 
CT T AT C T AAAAGT GT T AT T T T T GAT G G C G AAC C AAG AAT T G CG C AAG AT A 
AAGCGGCGATTTCTTGTCCAGATCACAACTGTCAGTTAAATTCTGCTATT 
GTAGTTCCTCTAAAAATAAATGATAAAACTGTGGGTGCCTTAAAAATGTA 
CTTTGCAGGAGATAAGACAATGTCTGAGGTGGAGGAAAACCTAGTCCTTG 
GTTTAGCGCAAATATTTTCAGGACAACTGGCAATGGGGATAACAGAGGAA 
CAAAATAAGTTAGCCAGTATGGCAGAGATAAAGGCTTTACAAGCACAAAT 
CAACCCTCATTTCTTCTTTAATGCCATTAACACAATTAGTGCATTAATCC 
GTATTGATTCTGATAAAGCACGTTATGCACTGATGCAGTTAAGTACTTTT 
T T T AGAAC AAGT T T G C AG G GT GGT C AG GAT C G T G AGG T AACG C T T G AGC A 
a g AAAAAT C AC AT G T GGAT G CT T AT AT G AAT G T T G AAAAAT T AC G T T T CC 
CT G AT AAAT AT C AGT T AT C TT AT GAT AT T AGT G C AC C AG AAAAAAT G AAG 
TTACCACCTTTTGGTTTACAGGTACTGGTAGAGAATGCAGTTCGACATGC 
T T T C AAAG AAC GT AAG AC GG AC AAC CAT AT AT T GGT T C AAAT AAAG C C AG 
ATGGTCATTATTATTGTGTTTCTGTTAGTGACAATGGACAAGGAATCTCA 
GAT ACT AT CAT T GAT AAAT T AGGT CAAGAAACAGTT GCAGAGAGT AAGGG 
TACAGGTACTGCTCTAGTTAATCTAAATAACAGGCTGAATTTATTATATG 
GT AGT GT AAGT T GC C T T CAT T T T T CG AG C G AC AAG AAT GGT AC AAAAGT T 
TGGTATCGAATACCTAATAGAATAAGGGAGGATGAGCATGAAAATTTTAA 
TTCT 

SEQ ID NO. 5811 
STRAIN 2 603 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNHI LVQIKPDGHYYCVS VS DNGQGIS DT 1 1 DKLGQET VAE SKGTGT ALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5812 

STRAIN 090 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNHI LVQIKPDGHYY CVS VS DNGQGIS DTI I DKLGQET VAE SKGTGT ALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5813 

STRAIN A90 9 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNH I LVQ I KPDGHYYCVS VS DNGQG I S DT 1 1 DKLGQET VAE S KGTGT ALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5814 

STRAIN H36B frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
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DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5815 

STRAIN 18RS21 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5816 

STRAIN M7 32 frame: 1 

LMVLLFQRLGI IMILAFLLVNNS YFRQLIEERSKRET WLVI I FGLFVI I SNITGIE IKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGIGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVCALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5817 

STRAIN COH1 frame: 1 

LMVLLFQRLGI IMILAFLLVNNS YFRQL I EERSKRETVVLVI I FGLFVI I SNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVCALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNH I LVQIKPDGHYYCVSVS DNGQGIS DT 1 1 DKLGQETVAE SKGTGTALVNLNNR 
LN LL YGS VS CLHFS S DKNGTKVW YRI PNRIRE DEHEN FN S 

SEQ ID NO. 5818 

STRAIN M781 frame: 1 

LMVLLFQRLGI IMILAFLLVNNSYFRQLIEERSKRETVVLVI I FGLFVI I SNITGIE IKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIVVPLKINDKTVCALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LN LL YG S VS CLHFS S DKNGTKVW YR I PNRIRE DEHEN FN S 

SEQ ID NO. 5819 

STRAIN CJB110 frame: 1 
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LMVLLFQRLGI IMILAFLLVNNSYFRQLIEERSKRETWLVI I FGLFVI I SNITGIE IKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVSDNGQGISDTIIDKLGQETVAESKGTGTALVNLNNR 
LNLL YGS VS CLH FS S DKNGTKVW YRI PNRIRE DEHEN FN S 

SEQ ID NO. 5820 

STRAIN 1169NT frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETWLVIIFGLFVIISNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKT DNHILVQIKPDGHYYCVSVS DNGQGIS DTI I DKLGQETVAE SKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRIPNRIREDEHENFNS 

SEQ ID NO. 5821 

STRAIN JM9130013 frame: 1 

LMVLLFQRLGIIMILAFLLVNNSYFRQLIEERSKRETVVLVII FGLFVI I SNITGIEIKG 
DRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGSF 
YIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFTGWELVKMIVI 
PMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQSARSVCE 
IIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQDKAAIS 
CPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQLAMGIT 
EEQNKLASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFRTSLQGG 
QDREVTLEQEKSHVDAYMNVEKLRFPDKYQLSYDISAPEKMKLPPFGLQVLVENAVRHAF 
KERKTDNHILVQIKPDGHYYCVSVS DNGQGIS DTI I DKLGQETVAESKGTGTALVNLNNR 
LNLLYGSVSCLHFSSDKNGTKVWYRI PNRIRE DEHEN FNS 

SEQ ID NO. 5901 
STRAIN 2603 

ATGAATAAAAGAAGAAAATTATCAAAATTGAATGTAAAAAAACATCATTTAGCTTATGGA 
GC T AT C ACT T T AGT AG CCCTTTTTT C AT GT AT T T T GG CT GT AAT G GT CAT C T T T AAAAGT 
T C AC AAGTT AC T ACT GAAT C T T T G T C AAAAG C AG AT AAAGT T C G C GT AG C C AAAAAAT C A 
AAAAT GACT AAG G CG AC AT C T AAAT C AAAAGT AG AAG AT GTAAAACAGG C T C C AAAAC C T 
T C T C AGGC AT CT AAT G AAGC C C C AAAAT C AAGT T C T C AAT C T AC AG AAG C T AAT T CT C AG 
C AAC AAGT TACT G C GAG T GAAG AGG C AG C T GT AG AAC AAG C AG T T GT AAC AGAAAAC AC C 
CCTGCTACCAGTCAGGCACAACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCT 
CAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATAcTGCAGGGGCTATTGGCTCA 
GCAGCTGCAGCACAAATGGCTGCTGCAAcAGGAGTCCCTCAGTCTACTTGGGAAcATATT 
AT T G CC CGT GAAT C AAAT GG T AAT C CT AAT GT T G CT AAT G C C T C AGG AG C T T C AG GAC T T 
T T C C AAACG AT G C C AG GT T G GG GT T C AAC AG CT AC AGT T C AG GAT C AAG T T AAT T C AG CT 
ATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTACTAG 

SEQ ID NO. 5902 

STRAIN JM9130013 

AAAAGTT C ACAAGTT ACT ACT GAAT CTTTGTCAAA 

AGCAGATAAAGTTCGCGTAGCCAAAAAATCAAAAATGAATAAGGCAACAT 
CT AAAT CAAAAGTAGAAGGT GTAAAACAGGCT CCAAAACC AAGTT CT CAA 
T C T AC AG AAG C T AAT T C T C AG C AAC AAGT TAG T G C GAG T GAAG AGGC AG C 
T GT AG AAC AAGC AGT T GT AAC AG AAAAT AC C C CT G C T AC C AGT C AAG C AC 
AACAAGCTTATGCTGTTACTGAGACAACTTATAGACCTGCTCAACACCAG 
CCGAGTGGCC AAGT ATT GAG C AAT GG AAAT ACTGCAGGGGTT AT TGGCTC 
AGCAGCAGCAGCACAAATGGCTGCTGCAACGGGAGTTCCTCAGTCTACTT 
GGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAACGTTGCTAAT 
GCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAAC 
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AGC T AC AGT T C AGG AT C AAGT T AAT t C AG C T AT T AAAG CT T AT C GT G C T C 
AAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 5903 

STRAIN 1169NT reverse complement 

AAAAGTTCACAAGTTACT ACTGAAT CT T TGT CAAAAGCAGAT AAAGTT CGCGT AGCC 
AAAAAAT C AAAAAT G AC T AAGGCG AC AT C T AAAT C AAAAGT AGAAG AT GT AAAAC AGG C T 
CCAAAACCTTCTCAGGCATCTAATGAAGTCCCAAAATCAAGTTCTCAATCTACAGAAGCT 
AATTCTCAGCAACAAGTTACTGCGAGTGAAGAGGCGGCTGTAGAACAAGCAGTTGTAACA 
G AAAAT AC C C C T G C TAG C AG T C AGG C AC AAC AAACT T AT G CT GT T AC T GAG AC AAC T T AC 
AAAC C T G C T C AAC AC C AG AC AAGT GGC C AAG T AT T GAG C AAT GG AAAT ACT G C AGGGG C G 
GTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGG 
GAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCT 
T C AGG ACT T T T C C AAAC GAT G C C AG GT T GG GGT T C AAC AG CT AC AG T T C AGG AT C AAGT T 
AATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 5904 

STRAIN 18RS21 reverse complement 

AAAAG T T C AC AAGT TACT AC T G AAT C T T T G T C AAAAG C AG AT AAAG T T C 
GCGTAGCCAAAAAATCAAAAATGACTAAGGCGACATCTAAATCAAAAGTAGAAGATGTAA 
AAC AGG CT C C AAAAC CT T C T C AGGC AT C T AAT G AAG C C C C AAAAT C AAG T T C T C AAT C T A 
C AG AAG CT AAT T CT C AGC AACAAG T T AC T G C G AGT G AAG AGG C AG C T G TAG AAC AAG C AG 
T T GT AAC AG AAAAC AC C C CT G C T AC C AG T C AG GC AC AAC AAG CT TAT G CT GT T AC TG AG A 
CAACTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATACTG 
C AG GGGC T AT T G G C T C AGC AG CT G C AG C AC AAAT G G C T G C T G C AAC AG GAGT C C C T C AGT 
CTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCT 
CAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGG 
AT C AAGT T AAT T C AG CT AT T AAAG C T TAT C GT G C T C AAGGT T TAT C AG CTTGGGGT T AC 

SEQ ID NO. 5905 

STRAIN 090 reverse complement 

TAG C C AAAAAAT C AAAAAT GAT T AAGG C G AC AT C T AAAT C AAAAG T AGAAG AT GT AAAAC 
AGG CT C C AAAAC CT T C T C AGG CAT C T AAT G AAG C C C C AAAAT C AAGT T CT C AAT CT AC AG 
AAG CT AAT T C T C AG C AAC AAGT TACT G CGAGT G AAG AG G C AG CT G TAG AAC AAG C AG T T G 
T AAC AG AAAAC AC C C C T GC T AC C AGT C AG G C AC AAC AAG CT T AT G C T GT T AC T G AG AC AA 
CTTATAGACCTGCTCAACACCAGACGAGTGGCCAAGTATTGAGTAATGGAAATACTGCAG 
GGGCTATTGGCTCAGCAGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTA 
CTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAG 
GAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGCTACAGTTCAGGA 

SEQ ID NO. 5906 

STRAIN A90 9 reverse complement 

AAGG CG AC AT CT AAAT CAAAAGT AG AAGATGT AAAAC AGGCTCC AAAAC CTTCTC AGGC A 
T C T AAT G AAG C C C C AAAAT C AAGT T CT C AAT C T AC AG AAG C T AAT T C T C AG C AAC AAGT T 
ACT GC G AGT G AAG AG G C AGC T G TAG AAC AAG C AGT T G T AAC AG AAAAC AC C C C T G C T AC C 
AGT C AG G C AC AAC AAG C T TAT G C T G T T AC T GAG AC AAC T TAT AG AC CT G C T C AAC AC C AG 
ACAAGTGGCCAAGTATTGAGTAATGGAAATACTGCAGGGGCTATTGGCTCAGCAGCTGCA 
GCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGT 
GAATCAAATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACG 
AT G C C AG GTTGGGGTT C AAC AG C T AC AGT T C AG AAT C AAGT T AAT T C AG C TAT T AAAG CT 
TATCGTGCTCAAGGTTTATCA 

SEQ ID NO. 5907 

STRAIN CJB110 reverse complement 

AAT C T T T GT C AAAAG C AG AT AAAG T T C G CG T AG C C AAAAAAT C AAAAAT G AC T AAG GC G A 
CAT C T AAAT CAAAAGT AG AAG AT G T AAAAC AG G C T C C AAAAC C T T C T C AG G CAT C T AAT G 
AAG C C C C AAAAT C AAG T T C T C AAT C T AC AG AAG C T AAT T C T C AG C AAC AAGT T AC T G C GA 
GT G AAG AGG C AG C TGT AG AAC AAG C AGT T GT AAC AG AAAAC AC C C C T G C T AC C AG T C AGG 
C AC AAC AAG C T T AT GC T G T T AC T GAG AC AAC T TAT AG AC CT G C T C AAC AC C AG AC GAG T G 
GCCAAGTATTGAGTAATGGAAATACTGCAGGGGCTATTGGCTCAGCAGCTGCAGCACAAA 
TGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAA 
ATGGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAG 
GTTGGGGTTCAACAGCTACAGTTCAGGATCAAGTTAATTCAGCTATTAAAGCTTATCGTG 
CTCAAGGTTT AT C AGCT TGGGGT TAC 
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SEQ ID NO. 5908 

STRAIN COH1 reverse complement 

AAAAG T T C AC AAGT T AC TACT G AAT C T T T GT C AAAAGC AG AT AA 

AGT T CG CG T AGC C AAAAAAT C AAAAAT G AC T AAG G C G AC AT C T AAAT C AAAAGT AGAAGA 
TGTAAAACAGGCTCCAAAACCTTCTCAGGCATCTAATGAAGCCCCAAAATCAAGTTCTCA 
AT C T AC AGAAG C T AAT T CT C AG C AAC AAG T T AC T GC G AGT G AAG AGG C GG CT GT AG AAC A 
AG C AGT T GT AAC AGAAAAT AC C C C T G CT AC C AGT C AG G C AC AAC AAAC T T AT GCT GT T AC 
T G AG AC AACT T AC AAAC C T G C T C AAC AC C AG AC AAGT GG C C AAGT AT T GAG C AAT G G AAA 
TACTGCAGGGGCGGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCC 
TCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAATGGTAATCCTAATGTTGCTAA 
T G C CT C AGG AGC T T C AGG AC T T T T C C AAAC GAT G C C AGG T T GGG GT T C AAC AG C T AC AG T 

TCAGGATCAAGTTAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGG 
TTAC 

SEQ ID NO. 5909 

STRAIN H3 6B reverse complement 
AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGC 

AG AT AAAGT T CG CGT AG C C AAAAAAT C AAAAAT G AC T AAGG C G AC AT CT AAAT C AAAAGT 
AG AAG AT GT AAAAC AGG CT C C AAAAC CT T C T C AG GC AT CT AAT GAAGC C C C AAAAT C AAG 
T T CT CAAT CT AC AG AAG C T AAT T C T C AG C AAC AAG T TAG T G C GAGT GAAGAGG C AG C T GT 
AG AAC AAG C AGT T GT AAC AGAAAAC AC C C CT G CT AC C AGT CAG G C AC AAC AAG C T TAT G C 
T GT T AC T GAG AC AAC T TAT AG AC C T G C T C AAC AC C AG AC AAGT GGC C AAG TAT T G AGT AA 
T GG AAAT ACT G C AGGG G C T AT T G G C T C AGC AGC T G C AG C AC AAAT GGCTGCTG C AAC AGG 
AGT C C CT CAG T C T AC T T GGG AAC AT AT TAT T G C C C GT G AAT C AAAT GG T AAT C CT AAT GT 
TGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGTTGGGGTTCAACAGC 
T AC AG T T CAG GAT C AAG T T AAT T CAG C T AT T AAAG C T T 

SEQ ID NO. 5910 

STRAIN M732 reverse complement 

AAAAGTTCACAAGTTACTACTGAATCTTTGTCAAAAGCAGATAAAGTTCGCGTAGC 
C AAAAAAT C AAAAAT GACTAAGGCGACAT CT AAAT C AAAAGT AG AAGAT GT AAAAC AGG C 
T C C AAAACC T T C T C AGG CAT C T AAT G AAG C C C C AAAAT C AAGT T C T CAAT C T AC AG AAG C 
T AAT T C T C AGC AAC AAGT T AC T G C GAGT G AAG AG G C GG C T GT AG AAC AAG C AGT T GT AAC 
AG AAAAT AC C C C T GC T AC C AGT CAG G C AC AAC AAAC T T ATG CT GT TACT G AG AC AACT T A 
C AAAC C T G CT C AAC AC CAG AC AAG TG G C C AAGT AT T G AGC AAT G GAAAT AC T G C AGG GGC 
GGTCGGATCTGCTGCTGCAGCACAAATGGCTGCTGCAACAGGAGTCCCTCAGTCTACTTG 
G G AAC AT AT TAT T GC C C GT G AAT C AAAT G GT AAT C C T AAT GT T G C T AAT G C C T C AGG AG C 
T T CAG G ACT T T T C C AAAC GAT G C C AG GTT G G GGT T C AAC AG C T AC AGT T CAG GAT C AAGT 
TAATTCAGCTATTAAAGCTTATCGTGCTCAAGGTTTATCAGCTTGGGGTTA 

SEQ ID NO. 5911 

STRAIN M781 reverse complement 

T CTTTGTCAAAAGC AG AT AAAGT TCGCGTAGCC AAAAAAT CAAAAATGACTAAGGCG AC A 
T CT AAAT C AAAAGT AGAAGAT GT AAAAC AGG C T C C AAAAC C T T C T C AG GC AT C T AAT GAA 
G C C C C AAAAT C AAGT T C T CAAT C T AC AG AAG CT AAT T CT CAG C AAC AAG T T ACT G CG AG T 
G AAG AGG C GG CT G T AG AAC AAGC AGT T GT AAC AG AAAAT AC C C CT G CT AC C AGT CAG G C A 
CAACAAACTTATGCTGTTACTGAGACAACTTACAAACCTGCTCAACACCAGACAAGTGGC 
C AAGT AT T GAG CAAT GG AAAT AC T G C AGGG G C GGT CG GAT CTGCTGCT G C AG C AC AAAT G 
GCTGCTGCAACAGGAGTCCCTCAGTCTACTTGGGAACATATTATTGCCCGTGAATCAAAT 
GGTAATCCTAATGTTGCTAATGCCTCAGGAGCTTCAGGACTTTTCCAAACGATGCCAGGT 
T GG GGT T C AAC AG C T AC AGT T CAG GAT C AAG T T AAT T C AGC TAT T AAAG C T TAT C GT G C T 
CAAGGTTTATCAGCTTGGGGTTAC 

SEQ ID NO. 5912 
STRAIN 2 603 frame: 1 

MNKRRKLSKLNVKKHHLAYGAITLVALFSCILAVMVIFKSSQVTTESLSKADKVRVAKKS 
KMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAVVTENT 
PATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQSTWEHI 
IARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRAQGLSAWGY 

SEQ ID NO. 5913 

STRAIN 1169NT frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEVPKSSSQSTEAN 
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SQQQVTASEEAAVEQAWTENTPATSQAQQTYAVTETTYKPAQHQTSGQVLSNGNTAGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

SEQ ID NO. 5914 

STRAIN 18RS21 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

SEQ ID NO. 5915 

STRAIN 2603 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

SEQ ID NO. 5916 

STRAIN 090 frame: 3 

AKKSKMIKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAW 
TENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQST 
WEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQ 

SEQ ID NO. 5917 

STRAIN A909 frame: 1 

KATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAWTENTPAT 
SQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQSTWEHIIAR 
ESNGNPNVANASGASGLFQTMPGWGSTATVQNQVNSAIKAYRAQGLS 

SEQ ID NO. 5918 

STRAIN CJB110 frame: 3 

SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASN.EAPKSSSQSTEANSQQQVTAS 
EEAAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 

SEQ ID NO. 5919 

STRAIN COH1 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
S QQQVTASEE AAVE QAWTENT PAT S QAQQT YAVTETT YKPAQHQT S GQVLS NGNT AGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWGY 

SEQ ID NO. 5920 

STRAIN H3 6B frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
S QQQVTASEE AAVE QAWTENT PAT SQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAI 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKA 

SEQ ID NO. 5921 

STRAIN M732 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEAN 
SQQQVTASEEAAVEQAVVTENT PATS QAQQT YAVTETT YKPAQHQTSGQVLSNGNT AGAV 
GSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVN 
SAIKAYRAQGLSAWG 

SEQ ID NO. 5922 

STRAIN M7 81 frame: 4 

SLSKADKVRVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTAS 
EE AAVE QAWTENT PATS QAQQT YAVTETT YKPAQHQTSGQVLSNGNTAGAVGSAAAAQM 
AAATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRA 
QGLSAWGY 
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SEQ ID NO. 5923 

STRAIN JM9130013 frame: 1 

KSSQVTTESLSKADKVRVAKKSKMNKATSKSKVEGVKQAPKPSSQSTEANSQQQVTASEE 

AAVEQAWTENTPATSQAQQAYAVTETTYRPAQHQPSGQVLSNGNTAGVIGSAAAAQMAA 

ATGVPQSTWEHIIARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRAQG 
LSAWGY 

SEQ ID NO. 6001 
STRAIN 2603 

ATGAAAGAAAAACAGTCGAAAAGGCTTATTTATATACTACTGGTTGTTTCCATTATTTTT 
AT AAGT GT T T T T ACAT AC AGT AT T AGC C AGC C T T CT AAAC TAG T T C C AC C AAAAGAAT T A 
GT T AT T CT AAG T C C AAAT AGT C AAG C CAT T T T AAC AGG AAC G AT T C C AG C T T T T GAG G AA 
AAAT ACG GT AT AAAAGT T AAG C T T AT T C AAGG T GGG AC AG GG C AAC T AAT AG AT AGAT T A 
AGT AAG G AGGGT AAG C AG T T G AAG G C G GAT AT T T T C T T TGG AGG AAAT T AT ACG C AAT T T 
GAAAGTCATAAGGCATTGTTTGAGTCTTACGTATCAAAGAATGTTCATACTGTTATTCCA 
G ACT AT AT C CAT C C AAGT GAT AC GG C G AC AC C T TAT AC TAT AAAT GGG AGT GT C T T GAT T 
GT AAAT AACG AAT TAG C T AAGG G AC T T AC CAT CAAG AG TT AT GAAG ATT TAT T ACAG C CT 
TCCTTAAAAGGTAAAATTGCCTTTGCAGATCCGAATACTTCCTCTAGTGCTTTCTCACAA 
C T C ACT AAT AT ACT C T T G G C C AAGG GT G G T T AC AC C AAT C C AAAAGC GT G GAAC T AT GT T 
AAAAAGCT AC AAC AT AAT AT T AAT G C TAT C AAAT C T T C TAG C T CT T C AG AAGT T TAT C AA 
TCAGTTGCAGAAGGAAAAATGATTGTGGGGCTGACTTACGAAGACCCTAGTGTCAATTTG 
C AAAAAAGT GGT GC C AAT GT T T C TAT T G TAT AT C CG AC AG AAG G GAC AGT TTTTGTCC C A 

TCTTCGGTTGCAATTATAAAGAATGCTCCTTCTATGAAAGAAGCAAAGTTATTTATTAAT 

TTTATGCTTTCTTTAGATGTTCAAAATGCCTTTGGGCAGTCAACGAGTAACCGACCTATT 

CGTAAAGATGCCCAAACGAGTAATGGCATGAAAGCTTTAAAGGATATTGCTACTCTTAAA 

GAAGATTATCGCTATGTCACTAAGCATAAGGGCCAAATCCTTAAAACCTATAATCGTATT 
C G TAG AAAT G CT GAT 

SEQ ID NO. 6002 

STRAIN 090 

C AG CC T T CT AAAC T ACTT C C AC C AAAAGAAT T AGT TAT T CT AAGT 
C C AAAT AGT CAAG C CAT T T T AAC AG GAAC GAT T C C AG C T T TT GAGGAAAA 
AT AC G GT AT AAAAGT T AAG C T TAT T C AAG GT G GGAC AGGG C AAC T AAT AG 
ATAGATTAAGTAAGGAGGGTAAGCAGTTGAAGGCGGATATTTTCTTTGGA 
G G AAAT TAT ACG C AAT T T G AAAGT CAT AAG G CAT T GT T T G AGT C T T AC G T 
AT C AAAGAAT GTT CAT AC T GT TAT T C C AG AC TAT AT C CAT C C AAGT GAT A 
C G G CGAC AC C T TAT AC TAT AAAT G G GAGT GT C T T GAT T G T AAAT AAC G AA - 
T T AGC T AAG G GACT T AC CAT CAAG AGT TAT G AAGAT T TAT T AC AG C C T T C 
CTTAAAAGGTAAAATTGCCTTTGC AGAT CCG AAT ACT TCCTCTAGTGCTT 
T C T C AC AAC T C ACT AAT AT AC T C T T G G C CAAG GGT GGT T AC AC C AAT C C A 
AAAG C GT G GAAC TAT G T T AAAAAG CT AC AAC AT AAT AT T AAT G CT AT C AA 
AT CT T C T AGC T C T T C AG AAGT T TAT C AAT C AGT T G C AGAAG G AAAAAT GA 
TTGTGGGGCTGACTTACGAAGACCCTAGTGTCAATTTGCAAAAAAGTGGT 
GC C AAT GT T T C T AT T GT AT AT C C GAC AG AAGGG AC AGT T T T T GT C C C AT C 
T T C GGT T G C AAT TAT AAAG AAT G C T CC T T C T AT G AAAG AAG C AAAG T TAT 
TTATTAATTTTATGCTTtCTTTAgATGTTCAAAATGCCTTTGGGCAGTCA 
ACGAGT AACCGAC CT ATT CGT AAAGATGCCCAAACGAGT AATGG CAT GAA 
AG CT T T AAAG GAT AT T G C T AC T C T T AAAG AAG AT TAT CG C TAT G T C AC T A 
AGC AT AAG GG C C AAAT C CT T AAAAC C T AT AAT C G TAT T C GT AG AAAT G CT 
GAT 

SEQ ID NO. 6003 

STRAIN A90 9 

C AGC C T T C T AAAC T AC T T C C AC C AAAAGAAT TAG 

T T AT T C T AAGT C C AAAT AGT C AAG C CAT T T T AAC AG GAAC GAT T C C AG CT 
TTTGAGGAAAAATACGGTATAAAAGTTAAGCTTATTCAAGGTGGGACAGG 
T C AAC T AAT AG AT AGAT T AAGT AAG G AGGGT AAG C AGT T G AAGG C G GAT A 
T TTTCTT TGG AGGAAATT AT ACGC AAT TTG AAAGT CAT AAGG CATTGTTT 
GAGT C T T AC G TAT C AAAG AAT AT T CAT AC T G T TAT T C C AG AT TAT AT C C A 
TCCGAGTGATACGGCGACACCTTATACTATAAATGGGAGTGTCTTGATTG 
T AAAT AAC G AAT TAG CT AAG GGAC T T AC CAT C AAGAGT T AT GAAG AT T T A 
TTACAGCCTTCCTTAAAAGGTAAAATTGCCTTTGCAGATCCGAATACTTC 
CTCTAGTGCTTTCTCACAACTCACTAATATACTCTTGGCCAAGGGTGGTT 
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